Logging to experiments/invertedPendulum/nov1/IA01_w350e3_seed3214
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5837896466255188
Validation loss = 0.3035692274570465
Validation loss = 0.290817528963089
Validation loss = 0.24106699228286743
Validation loss = 0.23713622987270355
Validation loss = 0.21094845235347748
Validation loss = 0.2184516191482544
Validation loss = 0.19766730070114136
Validation loss = 0.21007904410362244
Validation loss = 0.1873493194580078
Validation loss = 0.17325404286384583
Validation loss = 0.16865183413028717
Validation loss = 0.1648256480693817
Validation loss = 0.16902375221252441
Validation loss = 0.15995217859745026
Validation loss = 0.16721384227275848
Validation loss = 0.16142097115516663
Validation loss = 0.14955176413059235
Validation loss = 0.1418740600347519
Validation loss = 0.15238183736801147
Validation loss = 0.1573973298072815
Validation loss = 0.1388242542743683
Validation loss = 0.12578099966049194
Validation loss = 0.1321420669555664
Validation loss = 0.11940162628889084
Validation loss = 0.13222551345825195
Validation loss = 0.14417535066604614
Validation loss = 0.11992650479078293
Validation loss = 0.10658850520849228
Validation loss = 0.10825257003307343
Validation loss = 0.11238037794828415
Validation loss = 0.12519298493862152
Validation loss = 0.12274472415447235
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5958313941955566
Validation loss = 0.3216235041618347
Validation loss = 0.27222850918769836
Validation loss = 0.2502444088459015
Validation loss = 0.23844851553440094
Validation loss = 0.21908357739448547
Validation loss = 0.2077019363641739
Validation loss = 0.2164272964000702
Validation loss = 0.19413837790489197
Validation loss = 0.18702293932437897
Validation loss = 0.18142244219779968
Validation loss = 0.18691791594028473
Validation loss = 0.1703072041273117
Validation loss = 0.16806674003601074
Validation loss = 0.16628459095954895
Validation loss = 0.15181665122509003
Validation loss = 0.1604735255241394
Validation loss = 0.1593685895204544
Validation loss = 0.15891501307487488
Validation loss = 0.15435145795345306
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6050214171409607
Validation loss = 0.30671507120132446
Validation loss = 0.29455679655075073
Validation loss = 0.25130873918533325
Validation loss = 0.2485601305961609
Validation loss = 0.22206559777259827
Validation loss = 0.21739833056926727
Validation loss = 0.21891196072101593
Validation loss = 0.19787374138832092
Validation loss = 0.19612528383731842
Validation loss = 0.17863844335079193
Validation loss = 0.18519264459609985
Validation loss = 0.17372174561023712
Validation loss = 0.16365669667720795
Validation loss = 0.1712861806154251
Validation loss = 0.16521133482456207
Validation loss = 0.15600842237472534
Validation loss = 0.15092912316322327
Validation loss = 0.15797513723373413
Validation loss = 0.16544778645038605
Validation loss = 0.13413459062576294
Validation loss = 0.14199838042259216
Validation loss = 0.13488464057445526
Validation loss = 0.1340051144361496
Validation loss = 0.1290586143732071
Validation loss = 0.15768569707870483
Validation loss = 0.15128947794437408
Validation loss = 0.13259495794773102
Validation loss = 0.12900851666927338
Validation loss = 0.11411623656749725
Validation loss = 0.12016536295413971
Validation loss = 0.10465525835752487
Validation loss = 0.11427326500415802
Validation loss = 0.1252148151397705
Validation loss = 0.11444392055273056
Validation loss = 0.12638992071151733
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5965403318405151
Validation loss = 0.2930193245410919
Validation loss = 0.28725364804267883
Validation loss = 0.25156721472740173
Validation loss = 0.2434634566307068
Validation loss = 0.23085181415081024
Validation loss = 0.2161751389503479
Validation loss = 0.21957924962043762
Validation loss = 0.20557917654514313
Validation loss = 0.1921381950378418
Validation loss = 0.19145867228507996
Validation loss = 0.1806471049785614
Validation loss = 0.17261023819446564
Validation loss = 0.18448886275291443
Validation loss = 0.16382446885108948
Validation loss = 0.1577824503183365
Validation loss = 0.15978644788265228
Validation loss = 0.15981830656528473
Validation loss = 0.15284334123134613
Validation loss = 0.14561359584331512
Validation loss = 0.1468556523323059
Validation loss = 0.14491628110408783
Validation loss = 0.1378980427980423
Validation loss = 0.14055895805358887
Validation loss = 0.1339130401611328
Validation loss = 0.12849511206150055
Validation loss = 0.13892105221748352
Validation loss = 0.12346364557743073
Validation loss = 0.12149401754140854
Validation loss = 0.10948485136032104
Validation loss = 0.13600817322731018
Validation loss = 0.10636362433433533
Validation loss = 0.10537690669298172
Validation loss = 0.10677245259284973
Validation loss = 0.12087845057249069
Validation loss = 0.11399154365062714
Validation loss = 0.10114841163158417
Validation loss = 0.10190200060606003
Validation loss = 0.10880852490663528
Validation loss = 0.12076588720083237
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5769643783569336
Validation loss = 0.328085333108902
Validation loss = 0.2765403985977173
Validation loss = 0.2535010576248169
Validation loss = 0.23979485034942627
Validation loss = 0.24743281304836273
Validation loss = 0.2136148363351822
Validation loss = 0.20763985812664032
Validation loss = 0.19903630018234253
Validation loss = 0.2111450880765915
Validation loss = 0.18928422033786774
Validation loss = 0.17976437509059906
Validation loss = 0.17370161414146423
Validation loss = 0.16126418113708496
Validation loss = 0.17464356124401093
Validation loss = 0.15004993975162506
Validation loss = 0.14895133674144745
Validation loss = 0.16974110901355743
Validation loss = 0.13963541388511658
Validation loss = 0.14443492889404297
Validation loss = 0.14325137436389923
Validation loss = 0.1368083506822586
Validation loss = 0.12701858580112457
Validation loss = 0.11746028065681458
Validation loss = 0.11415034532546997
Validation loss = 0.11797267943620682
Validation loss = 0.14338035881519318
Validation loss = 0.11065934598445892
Validation loss = 0.10507678985595703
Validation loss = 0.09947960078716278
Validation loss = 0.10750804841518402
Validation loss = 0.1384044736623764
Validation loss = 0.10486450791358948
Validation loss = 0.11733198165893555
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0397  |
| Iteration     | 0        |
| MaximumReturn | -0.0164  |
| MinimumReturn | -0.302   |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.22087332606315613
Validation loss = 0.12239712476730347
Validation loss = 0.1033100038766861
Validation loss = 0.10608747601509094
Validation loss = 0.09926125407218933
Validation loss = 0.0916249006986618
Validation loss = 0.0906374379992485
Validation loss = 0.09819668531417847
Validation loss = 0.08474022895097733
Validation loss = 0.088019959628582
Validation loss = 0.09749935567378998
Validation loss = 0.10045235604047775
Validation loss = 0.09514880180358887
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.16270947456359863
Validation loss = 0.12187788635492325
Validation loss = 0.11189811676740646
Validation loss = 0.09740469604730606
Validation loss = 0.09679582715034485
Validation loss = 0.10125410556793213
Validation loss = 0.10144594311714172
Validation loss = 0.0941375121474266
Validation loss = 0.098689965903759
Validation loss = 0.09137992560863495
Validation loss = 0.08219881355762482
Validation loss = 0.0998755618929863
Validation loss = 0.09083614498376846
Validation loss = 0.09476462006568909
Validation loss = 0.08790556341409683
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.20000261068344116
Validation loss = 0.12393981963396072
Validation loss = 0.11049860715866089
Validation loss = 0.09813716262578964
Validation loss = 0.09503322839736938
Validation loss = 0.08856552094221115
Validation loss = 0.09617774933576584
Validation loss = 0.08697772771120071
Validation loss = 0.09853819757699966
Validation loss = 0.09618805348873138
Validation loss = 0.1237655058503151
Validation loss = 0.07874605059623718
Validation loss = 0.08896606415510178
Validation loss = 0.08704779297113419
Validation loss = 0.0963309183716774
Validation loss = 0.0797717496752739
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2178713083267212
Validation loss = 0.11548736691474915
Validation loss = 0.10125268995761871
Validation loss = 0.08218327909708023
Validation loss = 0.08123841136693954
Validation loss = 0.0907171443104744
Validation loss = 0.08695974200963974
Validation loss = 0.08103562891483307
Validation loss = 0.07848846912384033
Validation loss = 0.07837498933076859
Validation loss = 0.08383331447839737
Validation loss = 0.07610767334699631
Validation loss = 0.09866038709878922
Validation loss = 0.0939921960234642
Validation loss = 0.08274687081575394
Validation loss = 0.08307117223739624
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.23286889493465424
Validation loss = 0.1408090889453888
Validation loss = 0.10783044993877411
Validation loss = 0.09451963752508163
Validation loss = 0.10036206990480423
Validation loss = 0.08235514163970947
Validation loss = 0.09510759264230728
Validation loss = 0.10708305239677429
Validation loss = 0.09378427267074585
Validation loss = 0.08383381366729736
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00129  |
| Iteration     | 1         |
| MaximumReturn | -0.000921 |
| MinimumReturn | -0.00179  |
| TotalSamples  | 4998      |
-----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09125237911939621
Validation loss = 0.06173514947295189
Validation loss = 0.0647633820772171
Validation loss = 0.06625694036483765
Validation loss = 0.05821824446320534
Validation loss = 0.05740838870406151
Validation loss = 0.059871867299079895
Validation loss = 0.05598250776529312
Validation loss = 0.0586714968085289
Validation loss = 0.05636681988835335
Validation loss = 0.05533430725336075
Validation loss = 0.05424429848790169
Validation loss = 0.06331371515989304
Validation loss = 0.0652664452791214
Validation loss = 0.05773713439702988
Validation loss = 0.055273499339818954
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09168358892202377
Validation loss = 0.06263867020606995
Validation loss = 0.06796009093523026
Validation loss = 0.05532580614089966
Validation loss = 0.05793966352939606
Validation loss = 0.050837643444538116
Validation loss = 0.06127374619245529
Validation loss = 0.06264986097812653
Validation loss = 0.055347710847854614
Validation loss = 0.05216578021645546
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.09226115792989731
Validation loss = 0.061089906841516495
Validation loss = 0.06329800188541412
Validation loss = 0.056363362818956375
Validation loss = 0.05722150206565857
Validation loss = 0.059603460133075714
Validation loss = 0.06320395320653915
Validation loss = 0.059950895607471466
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08210285007953644
Validation loss = 0.07772191613912582
Validation loss = 0.05726509541273117
Validation loss = 0.05356382951140404
Validation loss = 0.060675956308841705
Validation loss = 0.0600263774394989
Validation loss = 0.08103349059820175
Validation loss = 0.061731886118650436
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07462486624717712
Validation loss = 0.061324164271354675
Validation loss = 0.0674823522567749
Validation loss = 0.06353065371513367
Validation loss = 0.06050177663564682
Validation loss = 0.057303786277770996
Validation loss = 0.06261254101991653
Validation loss = 0.06352660059928894
Validation loss = 0.053238749504089355
Validation loss = 0.05387629196047783
Validation loss = 0.06919918954372406
Validation loss = 0.06185654178261757
Validation loss = 0.059957362711429596
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.011764705882352941
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011627906976744186
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011494252873563218
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011363636363636364
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.02247191011235955
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022222222222222223
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02197802197802198
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.021739130434782608
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.021505376344086023
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02127659574468085
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.021052631578947368
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.020833333333333332
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.020618556701030927
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02040816326530612
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.020202020202020204
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.08    |
| Iteration     | 2        |
| MaximumReturn | -0.0777  |
| MinimumReturn | -36.3    |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09061211347579956
Validation loss = 0.043016623705625534
Validation loss = 0.038504719734191895
Validation loss = 0.038353171199560165
Validation loss = 0.035505909472703934
Validation loss = 0.03551172837615013
Validation loss = 0.03910081461071968
Validation loss = 0.0384049154818058
Validation loss = 0.034672196954488754
Validation loss = 0.03658508136868477
Validation loss = 0.03324998915195465
Validation loss = 0.034275494515895844
Validation loss = 0.03988932818174362
Validation loss = 0.03456553444266319
Validation loss = 0.03317781165242195
Validation loss = 0.03244619071483612
Validation loss = 0.033013518899679184
Validation loss = 0.03319941833615303
Validation loss = 0.03474950045347214
Validation loss = 0.034957144409418106
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0790836438536644
Validation loss = 0.04836590215563774
Validation loss = 0.043208327144384384
Validation loss = 0.03892255946993828
Validation loss = 0.037456583231687546
Validation loss = 0.034854546189308167
Validation loss = 0.03918676823377609
Validation loss = 0.034698840230703354
Validation loss = 0.03584662452340126
Validation loss = 0.03982416167855263
Validation loss = 0.034201279282569885
Validation loss = 0.03203901648521423
Validation loss = 0.03337516635656357
Validation loss = 0.03715232387185097
Validation loss = 0.03566914424300194
Validation loss = 0.03723930940032005
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08565377444028854
Validation loss = 0.04569047689437866
Validation loss = 0.0378740020096302
Validation loss = 0.03883378952741623
Validation loss = 0.03796873241662979
Validation loss = 0.03518514335155487
Validation loss = 0.04294051602482796
Validation loss = 0.03663407638669014
Validation loss = 0.033654145896434784
Validation loss = 0.03964371606707573
Validation loss = 0.035973306745290756
Validation loss = 0.036293767392635345
Validation loss = 0.0337909534573555
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0746629610657692
Validation loss = 0.04714275524020195
Validation loss = 0.04481259360909462
Validation loss = 0.04333864152431488
Validation loss = 0.03691532835364342
Validation loss = 0.03611786663532257
Validation loss = 0.03649752959609032
Validation loss = 0.036680903285741806
Validation loss = 0.03615332022309303
Validation loss = 0.03658124431967735
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08036961406469345
Validation loss = 0.04977819323539734
Validation loss = 0.04106941446661949
Validation loss = 0.03848901018500328
Validation loss = 0.0411422960460186
Validation loss = 0.035094499588012695
Validation loss = 0.03820360079407692
Validation loss = 0.03679797425866127
Validation loss = 0.03708026185631752
Validation loss = 0.03722609207034111
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.019801980198019802
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0196078431372549
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.019417475728155338
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.019230769230769232
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01904761904761905
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018867924528301886
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018691588785046728
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018518518518518517
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01834862385321101
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01818181818181818
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018018018018018018
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017857142857142856
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017699115044247787
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017543859649122806
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017391304347826087
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017241379310344827
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017094017094017096
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01694915254237288
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01680672268907563
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016666666666666666
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01652892561983471
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01639344262295082
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016260162601626018
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016129032258064516
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.124   |
| Iteration     | 3        |
| MaximumReturn | -0.0318  |
| MinimumReturn | -1.33    |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03981808200478554
Validation loss = 0.03173385560512543
Validation loss = 0.03077089786529541
Validation loss = 0.030830558389425278
Validation loss = 0.02645803801715374
Validation loss = 0.027848728001117706
Validation loss = 0.030438603833317757
Validation loss = 0.030895661562681198
Validation loss = 0.026989251375198364
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03637222200632095
Validation loss = 0.03072221949696541
Validation loss = 0.027371671050786972
Validation loss = 0.029632223770022392
Validation loss = 0.03286239504814148
Validation loss = 0.03207825496792793
Validation loss = 0.03334636986255646
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.043079912662506104
Validation loss = 0.03119276463985443
Validation loss = 0.0317201167345047
Validation loss = 0.028828591108322144
Validation loss = 0.03281446546316147
Validation loss = 0.028022348880767822
Validation loss = 0.02963661588728428
Validation loss = 0.028256753459572792
Validation loss = 0.028412798419594765
Validation loss = 0.02956509217619896
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03428572416305542
Validation loss = 0.030277345329523087
Validation loss = 0.029378535225987434
Validation loss = 0.029927853494882584
Validation loss = 0.028584323823451996
Validation loss = 0.03300180658698082
Validation loss = 0.03373297303915024
Validation loss = 0.02816728875041008
Validation loss = 0.026431221514940262
Validation loss = 0.02921702153980732
Validation loss = 0.0315508134663105
Validation loss = 0.03618178889155388
Validation loss = 0.03013628162443638
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0366356298327446
Validation loss = 0.03545448184013367
Validation loss = 0.02887803316116333
Validation loss = 0.03095449134707451
Validation loss = 0.028979431837797165
Validation loss = 0.03093048185110092
Validation loss = 0.0304701030254364
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015873015873015872
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015748031496062992
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015625
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015503875968992248
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015384615384615385
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015267175572519083
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015151515151515152
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015037593984962405
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014925373134328358
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014814814814814815
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014705882352941176
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014598540145985401
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014492753623188406
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014388489208633094
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014285714285714285
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014184397163120567
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014084507042253521
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013986013986013986
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013888888888888888
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013793103448275862
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0136986301369863
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013605442176870748
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013513513513513514
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013422818791946308
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013333333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00214 |
| Iteration     | 4        |
| MaximumReturn | -0.00167 |
| MinimumReturn | -0.00276 |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03932736814022064
Validation loss = 0.02918948233127594
Validation loss = 0.026847774162888527
Validation loss = 0.027035284787416458
Validation loss = 0.028897549957036972
Validation loss = 0.027029240503907204
Validation loss = 0.031236236914992332
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03232182562351227
Validation loss = 0.037580542266368866
Validation loss = 0.03078576549887657
Validation loss = 0.030863991007208824
Validation loss = 0.03182690218091011
Validation loss = 0.02680445834994316
Validation loss = 0.027091439813375473
Validation loss = 0.026050373911857605
Validation loss = 0.030067186802625656
Validation loss = 0.033009279519319534
Validation loss = 0.025648701936006546
Validation loss = 0.031246086582541466
Validation loss = 0.024109166115522385
Validation loss = 0.026608115062117577
Validation loss = 0.0275423564016819
Validation loss = 0.02995617687702179
Validation loss = 0.0328025184571743
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03519760072231293
Validation loss = 0.02675386145710945
Validation loss = 0.025245290249586105
Validation loss = 0.03192299231886864
Validation loss = 0.025730734691023827
Validation loss = 0.026470337063074112
Validation loss = 0.025023821741342545
Validation loss = 0.026761719956994057
Validation loss = 0.027243847027420998
Validation loss = 0.027397025376558304
Validation loss = 0.027156570926308632
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03977050632238388
Validation loss = 0.028096849098801613
Validation loss = 0.026755396276712418
Validation loss = 0.030158057808876038
Validation loss = 0.02933761104941368
Validation loss = 0.02658269740641117
Validation loss = 0.0264960378408432
Validation loss = 0.02375827543437481
Validation loss = 0.031831469386816025
Validation loss = 0.0254379753023386
Validation loss = 0.03246719390153885
Validation loss = 0.02672852948307991
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.035022445023059845
Validation loss = 0.030321890488266945
Validation loss = 0.02873961627483368
Validation loss = 0.02731279470026493
Validation loss = 0.029763704165816307
Validation loss = 0.03335190191864967
Validation loss = 0.02683679759502411
Validation loss = 0.02917669154703617
Validation loss = 0.03137911856174469
Validation loss = 0.03193366155028343
Validation loss = 0.03179721534252167
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013245033112582781
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013157894736842105
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013071895424836602
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012987012987012988
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012903225806451613
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01282051282051282
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012738853503184714
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012658227848101266
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012578616352201259
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0125
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012422360248447204
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012345679012345678
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012269938650306749
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012195121951219513
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012121212121212121
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012048192771084338
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011976047904191617
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011904761904761904
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011834319526627219
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011764705882352941
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011695906432748537
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011627906976744186
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011560693641618497
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011494252873563218
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011428571428571429
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000972 |
| Iteration     | 5         |
| MaximumReturn | -0.000663 |
| MinimumReturn | -0.00121  |
| TotalSamples  | 11662     |
-----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03419112414121628
Validation loss = 0.03536735847592354
Validation loss = 0.03303178399801254
Validation loss = 0.04060123860836029
Validation loss = 0.03192692622542381
Validation loss = 0.03230626881122589
Validation loss = 0.032065391540527344
Validation loss = 0.031572043895721436
Validation loss = 0.034260623157024384
Validation loss = 0.03483980894088745
Validation loss = 0.030967414379119873
Validation loss = 0.03043461963534355
Validation loss = 0.029038336127996445
Validation loss = 0.03140265867114067
Validation loss = 0.03052637353539467
Validation loss = 0.03266986459493637
Validation loss = 0.030972177162766457
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03464537858963013
Validation loss = 0.03854269161820412
Validation loss = 0.03560645878314972
Validation loss = 0.03010091744363308
Validation loss = 0.02947552129626274
Validation loss = 0.030224010348320007
Validation loss = 0.028294658288359642
Validation loss = 0.03326285630464554
Validation loss = 0.03319898992776871
Validation loss = 0.03242712467908859
Validation loss = 0.031456369906663895
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0326213501393795
Validation loss = 0.033034153282642365
Validation loss = 0.03226718679070473
Validation loss = 0.031136583536863327
Validation loss = 0.03501337021589279
Validation loss = 0.03521064296364784
Validation loss = 0.031239649280905724
Validation loss = 0.03027619980275631
Validation loss = 0.0356917604804039
Validation loss = 0.032191239297389984
Validation loss = 0.03031548298895359
Validation loss = 0.03178013116121292
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03602302446961403
Validation loss = 0.03244338929653168
Validation loss = 0.03265960514545441
Validation loss = 0.02948157861828804
Validation loss = 0.0302678644657135
Validation loss = 0.03307746723294258
Validation loss = 0.03197845071554184
Validation loss = 0.03688807040452957
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.031391531229019165
Validation loss = 0.034851670265197754
Validation loss = 0.030376940965652466
Validation loss = 0.032928332686424255
Validation loss = 0.03874322026968002
Validation loss = 0.03791879862546921
Validation loss = 0.029322445392608643
Validation loss = 0.034593041986227036
Validation loss = 0.04150162637233734
Validation loss = 0.032530300319194794
Validation loss = 0.033221788704395294
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011363636363636364
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011299435028248588
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011235955056179775
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0111731843575419
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011111111111111112
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011049723756906077
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01098901098901099
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01092896174863388
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010869565217391304
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010810810810810811
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010752688172043012
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0106951871657754
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010638297872340425
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010582010582010581
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010526315789473684
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010471204188481676
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010416666666666666
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010362694300518135
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010309278350515464
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010256410256410256
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01020408163265306
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01015228426395939
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010101010101010102
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010050251256281407
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.15    |
| Iteration     | 6        |
| MaximumReturn | -0.0321  |
| MinimumReturn | -25.8    |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.033838655799627304
Validation loss = 0.033696942031383514
Validation loss = 0.03184449300169945
Validation loss = 0.026888830587267876
Validation loss = 0.025459840893745422
Validation loss = 0.02325339801609516
Validation loss = 0.02668713964521885
Validation loss = 0.024719126522541046
Validation loss = 0.025261497125029564
Validation loss = 0.02874963916838169
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04117438569664955
Validation loss = 0.034134168177843094
Validation loss = 0.030973143875598907
Validation loss = 0.026617923751473427
Validation loss = 0.025455405935645103
Validation loss = 0.02749800868332386
Validation loss = 0.027182109653949738
Validation loss = 0.025765137746930122
Validation loss = 0.02698073349893093
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.034626711159944534
Validation loss = 0.042178284376859665
Validation loss = 0.032392967492341995
Validation loss = 0.030067861080169678
Validation loss = 0.029958589002490044
Validation loss = 0.027361510321497917
Validation loss = 0.025327375158667564
Validation loss = 0.028170816600322723
Validation loss = 0.02610243856906891
Validation loss = 0.024252913892269135
Validation loss = 0.022940238937735558
Validation loss = 0.023089108988642693
Validation loss = 0.024167075753211975
Validation loss = 0.024093158543109894
Validation loss = 0.02218524180352688
Validation loss = 0.021866874769330025
Validation loss = 0.024175627157092094
Validation loss = 0.027083443477749825
Validation loss = 0.02371191419661045
Validation loss = 0.026461714878678322
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03529173880815506
Validation loss = 0.030665012076497078
Validation loss = 0.028342416509985924
Validation loss = 0.023579934611916542
Validation loss = 0.029219619929790497
Validation loss = 0.023176826536655426
Validation loss = 0.02343169040977955
Validation loss = 0.028380168601870537
Validation loss = 0.024986632168293
Validation loss = 0.02321588806807995
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0364486500620842
Validation loss = 0.028855854645371437
Validation loss = 0.028183870017528534
Validation loss = 0.025544585660099983
Validation loss = 0.026187308132648468
Validation loss = 0.029363229870796204
Validation loss = 0.025605997070670128
Validation loss = 0.025361938402056694
Validation loss = 0.024761920794844627
Validation loss = 0.023856880143284798
Validation loss = 0.025305794551968575
Validation loss = 0.02815830148756504
Validation loss = 0.024391019716858864
Validation loss = 0.027071326971054077
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009950248756218905
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009900990099009901
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009852216748768473
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.00980392156862745
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.00975609756097561
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009708737864077669
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.00966183574879227
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009615384615384616
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009569377990430622
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009523809523809525
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009478672985781991
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009433962264150943
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009389671361502348
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009345794392523364
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009302325581395349
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009259259259259259
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009216589861751152
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009174311926605505
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0091324200913242
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.00909090909090909
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.00904977375565611
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009009009009009009
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008968609865470852
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008928571428571428
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008888888888888889
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0343  |
| Iteration     | 7        |
| MaximumReturn | -0.017   |
| MinimumReturn | -0.0477  |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02960340864956379
Validation loss = 0.026311060413718224
Validation loss = 0.025368506088852882
Validation loss = 0.02622113935649395
Validation loss = 0.02331225760281086
Validation loss = 0.0233488529920578
Validation loss = 0.02509540133178234
Validation loss = 0.023312807083129883
Validation loss = 0.027374863624572754
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.027986986562609673
Validation loss = 0.027094395831227303
Validation loss = 0.02566225826740265
Validation loss = 0.02138659358024597
Validation loss = 0.026439214125275612
Validation loss = 0.023703724145889282
Validation loss = 0.022560594603419304
Validation loss = 0.02331455983221531
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025048553943634033
Validation loss = 0.022217977792024612
Validation loss = 0.023394444957375526
Validation loss = 0.023064177483320236
Validation loss = 0.022968387231230736
Validation loss = 0.02113986946642399
Validation loss = 0.021553652361035347
Validation loss = 0.024224597960710526
Validation loss = 0.021940888836979866
Validation loss = 0.020437706261873245
Validation loss = 0.022576669231057167
Validation loss = 0.02014235593378544
Validation loss = 0.01998998038470745
Validation loss = 0.02082909271121025
Validation loss = 0.02177523635327816
Validation loss = 0.02241290919482708
Validation loss = 0.020438317209482193
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02781236171722412
Validation loss = 0.0226923655718565
Validation loss = 0.026227708905935287
Validation loss = 0.025374600663781166
Validation loss = 0.028349613770842552
Validation loss = 0.025631768628954887
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02581583335995674
Validation loss = 0.025446338579058647
Validation loss = 0.023805217817425728
Validation loss = 0.024370605126023293
Validation loss = 0.023790327832102776
Validation loss = 0.02016679011285305
Validation loss = 0.01994255557656288
Validation loss = 0.022849401459097862
Validation loss = 0.024275807663798332
Validation loss = 0.02363470010459423
Validation loss = 0.025991203263401985
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008849557522123894
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.00881057268722467
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008771929824561403
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008733624454148471
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008695652173913044
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008658008658008658
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008620689655172414
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008583690987124463
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008547008547008548
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.00851063829787234
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.00847457627118644
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008438818565400843
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008403361344537815
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008368200836820083
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008333333333333333
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008298755186721992
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008264462809917356
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.00823045267489712
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.00819672131147541
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.00816326530612245
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008130081300813009
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008097165991902834
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008064516129032258
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008032128514056224
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0292  |
| Iteration     | 8        |
| MaximumReturn | -0.0165  |
| MinimumReturn | -0.0443  |
| TotalSamples  | 16660    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023776091635227203
Validation loss = 0.023650705814361572
Validation loss = 0.021645821630954742
Validation loss = 0.025731323286890984
Validation loss = 0.021716730669140816
Validation loss = 0.02160480059683323
Validation loss = 0.02341429889202118
Validation loss = 0.020713139325380325
Validation loss = 0.02286762185394764
Validation loss = 0.02649594284594059
Validation loss = 0.02160687744617462
Validation loss = 0.020932480692863464
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022881705313920975
Validation loss = 0.023951295763254166
Validation loss = 0.023453690111637115
Validation loss = 0.02144443616271019
Validation loss = 0.021571658551692963
Validation loss = 0.023171432316303253
Validation loss = 0.021236781030893326
Validation loss = 0.023091144859790802
Validation loss = 0.027058765292167664
Validation loss = 0.01994236186146736
Validation loss = 0.022729385644197464
Validation loss = 0.023328883573412895
Validation loss = 0.019487902522087097
Validation loss = 0.02034994587302208
Validation loss = 0.02511570043861866
Validation loss = 0.02334727719426155
Validation loss = 0.021721746772527695
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02356748655438423
Validation loss = 0.01979655772447586
Validation loss = 0.020675495266914368
Validation loss = 0.018599217757582664
Validation loss = 0.021463772282004356
Validation loss = 0.023514755070209503
Validation loss = 0.019692786037921906
Validation loss = 0.019963331520557404
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022277865558862686
Validation loss = 0.023324526846408844
Validation loss = 0.023862630128860474
Validation loss = 0.026052596047520638
Validation loss = 0.024101993069052696
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02559703402221203
Validation loss = 0.02019866742193699
Validation loss = 0.0228921789675951
Validation loss = 0.0209036972373724
Validation loss = 0.019837457686662674
Validation loss = 0.021223880350589752
Validation loss = 0.02411629818379879
Validation loss = 0.022731240838766098
Validation loss = 0.02073773182928562
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.00796812749003984
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007936507936507936
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007905138339920948
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007874015748031496
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.00784313725490196
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0078125
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007782101167315175
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007751937984496124
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007722007722007722
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007692307692307693
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007662835249042145
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007633587786259542
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0076045627376425855
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007575757575757576
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007547169811320755
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007518796992481203
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.00749063670411985
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007462686567164179
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007434944237918215
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007407407407407408
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007380073800738007
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007352941176470588
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007326007326007326
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0072992700729927005
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007272727272727273
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.026   |
| Iteration     | 9        |
| MaximumReturn | -0.0169  |
| MinimumReturn | -0.0401  |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020974496379494667
Validation loss = 0.02120015025138855
Validation loss = 0.020979713648557663
Validation loss = 0.020280204713344574
Validation loss = 0.01905553974211216
Validation loss = 0.01931801438331604
Validation loss = 0.02001272700726986
Validation loss = 0.018666621297597885
Validation loss = 0.018321705982089043
Validation loss = 0.018739867955446243
Validation loss = 0.018763644620776176
Validation loss = 0.01856783963739872
Validation loss = 0.020666047930717468
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020689157769083977
Validation loss = 0.01825762540102005
Validation loss = 0.019785357639193535
Validation loss = 0.019778721034526825
Validation loss = 0.017962176352739334
Validation loss = 0.01950087398290634
Validation loss = 0.020089594647288322
Validation loss = 0.01866135373711586
Validation loss = 0.02090616337954998
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02216877043247223
Validation loss = 0.020096590742468834
Validation loss = 0.019321925938129425
Validation loss = 0.01891058310866356
Validation loss = 0.017222821712493896
Validation loss = 0.018716510385274887
Validation loss = 0.02017119713127613
Validation loss = 0.021706579253077507
Validation loss = 0.017956262454390526
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025788497179746628
Validation loss = 0.02156219631433487
Validation loss = 0.020894167944788933
Validation loss = 0.019945265725255013
Validation loss = 0.021833309903740883
Validation loss = 0.02008228749036789
Validation loss = 0.019161788746714592
Validation loss = 0.022713616490364075
Validation loss = 0.021054260432720184
Validation loss = 0.02042585425078869
Validation loss = 0.02079596370458603
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019037313759326935
Validation loss = 0.02007223479449749
Validation loss = 0.019557368010282516
Validation loss = 0.019207432866096497
Validation loss = 0.020544759929180145
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007246376811594203
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.007220216606498195
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.014388489208633094
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014336917562724014
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014285714285714285
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014234875444839857
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014184397163120567
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014134275618374558
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014084507042253521
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014035087719298246
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013986013986013986
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013937282229965157
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013888888888888888
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01384083044982699
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013793103448275862
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013745704467353952
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0136986301369863
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013651877133105802
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013605442176870748
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013559322033898305
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013513513513513514
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013468013468013467
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013422818791946308
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013377926421404682
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013333333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.287   |
| Iteration     | 10       |
| MaximumReturn | -0.0225  |
| MinimumReturn | -5.8     |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020774241536855698
Validation loss = 0.016371190547943115
Validation loss = 0.01683145761489868
Validation loss = 0.020760221406817436
Validation loss = 0.017105286940932274
Validation loss = 0.01869974657893181
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017153728753328323
Validation loss = 0.01808059588074684
Validation loss = 0.016797689720988274
Validation loss = 0.022358817979693413
Validation loss = 0.016898661851882935
Validation loss = 0.01697881706058979
Validation loss = 0.01639378070831299
Validation loss = 0.01900450885295868
Validation loss = 0.01566295139491558
Validation loss = 0.016081642359495163
Validation loss = 0.017983075231313705
Validation loss = 0.017689872533082962
Validation loss = 0.018678542226552963
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017909465357661247
Validation loss = 0.01719481684267521
Validation loss = 0.0156265702098608
Validation loss = 0.015877529978752136
Validation loss = 0.016017714515328407
Validation loss = 0.015938876196742058
Validation loss = 0.019285811111330986
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02002265863120556
Validation loss = 0.017574403434991837
Validation loss = 0.018655717372894287
Validation loss = 0.01915999874472618
Validation loss = 0.015739629045128822
Validation loss = 0.016908718273043633
Validation loss = 0.016367096453905106
Validation loss = 0.016833588480949402
Validation loss = 0.019124077633023262
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01803610846400261
Validation loss = 0.01767611876130104
Validation loss = 0.018279241397976875
Validation loss = 0.016669197008013725
Validation loss = 0.017800811678171158
Validation loss = 0.017713479697704315
Validation loss = 0.016300130635499954
Validation loss = 0.02016467973589897
Validation loss = 0.0165710486471653
Validation loss = 0.018437769263982773
Validation loss = 0.01861627772450447
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013289036544850499
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013245033112582781
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013201320132013201
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013157894736842105
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013114754098360656
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013071895424836602
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013029315960912053
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012987012987012988
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012944983818770227
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012903225806451613
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012861736334405145
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01282051282051282
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012779552715654952
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012738853503184714
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012698412698412698
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012658227848101266
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012618296529968454
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012578616352201259
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012539184952978056
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0125
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012461059190031152
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012422360248447204
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01238390092879257
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012345679012345678
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012307692307692308
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.949   |
| Iteration     | 11       |
| MaximumReturn | -0.0246  |
| MinimumReturn | -22.4    |
| TotalSamples  | 21658    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017354287207126617
Validation loss = 0.016925008967518806
Validation loss = 0.01601223833858967
Validation loss = 0.016499893739819527
Validation loss = 0.01653936877846718
Validation loss = 0.016161542385816574
Validation loss = 0.021106421947479248
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018270719796419144
Validation loss = 0.020724380388855934
Validation loss = 0.015798410400748253
Validation loss = 0.015432186424732208
Validation loss = 0.016269734129309654
Validation loss = 0.017842957749962807
Validation loss = 0.01532699167728424
Validation loss = 0.018113428726792336
Validation loss = 0.01606711745262146
Validation loss = 0.016718987375497818
Validation loss = 0.020788338035345078
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018554657697677612
Validation loss = 0.017009684816002846
Validation loss = 0.016585567966103554
Validation loss = 0.016023704782128334
Validation loss = 0.01660187914967537
Validation loss = 0.015524623915553093
Validation loss = 0.01545907836407423
Validation loss = 0.015975290909409523
Validation loss = 0.015040514059364796
Validation loss = 0.01737535372376442
Validation loss = 0.015239767730236053
Validation loss = 0.01527054887264967
Validation loss = 0.014929312281310558
Validation loss = 0.01742500439286232
Validation loss = 0.01565348543226719
Validation loss = 0.015972744673490524
Validation loss = 0.01657944731414318
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018724381923675537
Validation loss = 0.017999449744820595
Validation loss = 0.022876035422086716
Validation loss = 0.01719711534678936
Validation loss = 0.021710248664021492
Validation loss = 0.019181521609425545
Validation loss = 0.016562681645154953
Validation loss = 0.01703045889735222
Validation loss = 0.017126882448792458
Validation loss = 0.01573905721306801
Validation loss = 0.015870682895183563
Validation loss = 0.0151287242770195
Validation loss = 0.01716352254152298
Validation loss = 0.016079198569059372
Validation loss = 0.017855320125818253
Validation loss = 0.016254225745797157
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018435826525092125
Validation loss = 0.018226001411676407
Validation loss = 0.018061233684420586
Validation loss = 0.016767511144280434
Validation loss = 0.016499167308211327
Validation loss = 0.017770426347851753
Validation loss = 0.01734909787774086
Validation loss = 0.015353505499660969
Validation loss = 0.018604449927806854
Validation loss = 0.0180005244910717
Validation loss = 0.016536373645067215
Validation loss = 0.01666482724249363
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012269938650306749
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012232415902140673
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012195121951219513
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0121580547112462
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012121212121212121
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012084592145015106
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012048192771084338
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012012012012012012
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011976047904191617
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011940298507462687
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011904761904761904
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011869436201780416
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011834319526627219
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011799410029498525
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011764705882352941
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011730205278592375
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011695906432748537
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011661807580174927
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011627906976744186
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011594202898550725
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011560693641618497
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011527377521613832
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011494252873563218
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011461318051575931
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011428571428571429
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -10      |
| Iteration     | 12       |
| MaximumReturn | -0.0893  |
| MinimumReturn | -45.4    |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01860039122402668
Validation loss = 0.017000984400510788
Validation loss = 0.01576152816414833
Validation loss = 0.015618189238011837
Validation loss = 0.015713077038526535
Validation loss = 0.015261310152709484
Validation loss = 0.01588420383632183
Validation loss = 0.016445286571979523
Validation loss = 0.017243564128875732
Validation loss = 0.015222045592963696
Validation loss = 0.01572323590517044
Validation loss = 0.017156235873699188
Validation loss = 0.01619770936667919
Validation loss = 0.01380530558526516
Validation loss = 0.014462052844464779
Validation loss = 0.015835721045732498
Validation loss = 0.016118351370096207
Validation loss = 0.017449690029025078
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018897855654358864
Validation loss = 0.016340775415301323
Validation loss = 0.01557906810194254
Validation loss = 0.015682948753237724
Validation loss = 0.016468264162540436
Validation loss = 0.016790203750133514
Validation loss = 0.01608470268547535
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01684228889644146
Validation loss = 0.01407981850206852
Validation loss = 0.0140339070931077
Validation loss = 0.015685807913541794
Validation loss = 0.015286976471543312
Validation loss = 0.015903685241937637
Validation loss = 0.01581653021275997
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016045687720179558
Validation loss = 0.016550332307815552
Validation loss = 0.01752801425755024
Validation loss = 0.015148857608437538
Validation loss = 0.01658535934984684
Validation loss = 0.017470236867666245
Validation loss = 0.016414087265729904
Validation loss = 0.014294764026999474
Validation loss = 0.015224250964820385
Validation loss = 0.01722610555589199
Validation loss = 0.016473742201924324
Validation loss = 0.015624291263520718
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01902172714471817
Validation loss = 0.01586584933102131
Validation loss = 0.016503913328051567
Validation loss = 0.015146389603614807
Validation loss = 0.0167541466653347
Validation loss = 0.01863553188741207
Validation loss = 0.013978416100144386
Validation loss = 0.015608212910592556
Validation loss = 0.014563092030584812
Validation loss = 0.015432753600180149
Validation loss = 0.01543083693832159
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011396011396011397
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011363636363636364
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0113314447592068
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011299435028248588
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011267605633802818
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011235955056179775
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011204481792717087
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0111731843575419
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011142061281337047
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011111111111111112
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0110803324099723
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011049723756906077
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011019283746556474
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01098901098901099
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010958904109589041
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01092896174863388
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010899182561307902
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010869565217391304
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01084010840108401
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010810810810810811
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01078167115902965
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010752688172043012
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010723860589812333
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0106951871657754
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010666666666666666
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0596  |
| Iteration     | 13       |
| MaximumReturn | -0.0271  |
| MinimumReturn | -0.322   |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016475416719913483
Validation loss = 0.01671474613249302
Validation loss = 0.013960045762360096
Validation loss = 0.015128730796277523
Validation loss = 0.016014339402318
Validation loss = 0.01737201400101185
Validation loss = 0.016347570344805717
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01831793785095215
Validation loss = 0.015952218323946
Validation loss = 0.015361972153186798
Validation loss = 0.016566045582294464
Validation loss = 0.016650129109621048
Validation loss = 0.015482188202440739
Validation loss = 0.014797859825193882
Validation loss = 0.014754940755665302
Validation loss = 0.017259791493415833
Validation loss = 0.0144423209130764
Validation loss = 0.017933005467057228
Validation loss = 0.016409417614340782
Validation loss = 0.01671663485467434
Validation loss = 0.016723085194826126
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01704096607863903
Validation loss = 0.015116934664547443
Validation loss = 0.015817154198884964
Validation loss = 0.014665160328149796
Validation loss = 0.017153838649392128
Validation loss = 0.015116973780095577
Validation loss = 0.01557882409542799
Validation loss = 0.014288827776908875
Validation loss = 0.017397133633494377
Validation loss = 0.015649879351258278
Validation loss = 0.015339124947786331
Validation loss = 0.015090559609234333
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015818510204553604
Validation loss = 0.02024983800947666
Validation loss = 0.01740703545510769
Validation loss = 0.013432654552161694
Validation loss = 0.015078439377248287
Validation loss = 0.014670468866825104
Validation loss = 0.014850925654172897
Validation loss = 0.020268863067030907
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01623176597058773
Validation loss = 0.01551635842770338
Validation loss = 0.01627378724515438
Validation loss = 0.01444921549409628
Validation loss = 0.01698284037411213
Validation loss = 0.016147352755069733
Validation loss = 0.016082968562841415
Validation loss = 0.015743592754006386
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010638297872340425
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010610079575596816
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010582010582010581
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010554089709762533
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010526315789473684
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010498687664041995
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010471204188481676
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010443864229765013
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010416666666666666
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01038961038961039
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010362694300518135
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0103359173126615
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010309278350515464
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.012853470437017995
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01282051282051282
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01278772378516624
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012755102040816327
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01272264631043257
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012690355329949238
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012658227848101266
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012626262626262626
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012594458438287154
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01256281407035176
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012531328320802004
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0125
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.27    |
| Iteration     | 14       |
| MaximumReturn | -0.044   |
| MinimumReturn | -14.8    |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020058613270521164
Validation loss = 0.016399869695305824
Validation loss = 0.015757879242300987
Validation loss = 0.013267640955746174
Validation loss = 0.016740085557103157
Validation loss = 0.014636244624853134
Validation loss = 0.0174007136374712
Validation loss = 0.017193323001265526
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01692100055515766
Validation loss = 0.014437999576330185
Validation loss = 0.016582179814577103
Validation loss = 0.015268752351403236
Validation loss = 0.013690455816686153
Validation loss = 0.014297335408627987
Validation loss = 0.015463492833077908
Validation loss = 0.014231680892407894
Validation loss = 0.015080994926393032
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0162914227694273
Validation loss = 0.013856271281838417
Validation loss = 0.013935910537838936
Validation loss = 0.015042254701256752
Validation loss = 0.01507812924683094
Validation loss = 0.016153188422322273
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015935096889734268
Validation loss = 0.014386875554919243
Validation loss = 0.015126756392419338
Validation loss = 0.013786990195512772
Validation loss = 0.01702210120856762
Validation loss = 0.013976221904158592
Validation loss = 0.014409173280000687
Validation loss = 0.016384173184633255
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013770529069006443
Validation loss = 0.015679825097322464
Validation loss = 0.015193881466984749
Validation loss = 0.01751871034502983
Validation loss = 0.014829973690211773
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012468827930174564
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012437810945273632
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01240694789081886
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012376237623762377
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012345679012345678
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012315270935960592
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012285012285012284
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012254901960784314
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012224938875305624
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012195121951219513
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012165450121654502
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012135922330097087
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012106537530266344
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012077294685990338
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012048192771084338
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01201923076923077
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011990407673860911
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011961722488038277
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011933174224343675
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011904761904761904
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011876484560570071
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011848341232227487
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01182033096926714
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01179245283018868
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011764705882352941
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00244 |
| Iteration     | 15       |
| MaximumReturn | -0.00185 |
| MinimumReturn | -0.00308 |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017806401476264
Validation loss = 0.017424849793314934
Validation loss = 0.015862414613366127
Validation loss = 0.015189905650913715
Validation loss = 0.014149806462228298
Validation loss = 0.014295965433120728
Validation loss = 0.016305748373270035
Validation loss = 0.013916565105319023
Validation loss = 0.013642571866512299
Validation loss = 0.014213817194104195
Validation loss = 0.014952125959098339
Validation loss = 0.01569337211549282
Validation loss = 0.01873299852013588
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015386137180030346
Validation loss = 0.016252826899290085
Validation loss = 0.01601473055779934
Validation loss = 0.01596391201019287
Validation loss = 0.015351906418800354
Validation loss = 0.014642642810940742
Validation loss = 0.01650821976363659
Validation loss = 0.01431871484965086
Validation loss = 0.015750283375382423
Validation loss = 0.013931800611317158
Validation loss = 0.016162464395165443
Validation loss = 0.014208373613655567
Validation loss = 0.015436294488608837
Validation loss = 0.017800746485590935
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02085007354617119
Validation loss = 0.015143290162086487
Validation loss = 0.014608004130423069
Validation loss = 0.0195591039955616
Validation loss = 0.01656617969274521
Validation loss = 0.014406384900212288
Validation loss = 0.014759423211216927
Validation loss = 0.017130451276898384
Validation loss = 0.015137078240513802
Validation loss = 0.016699926927685738
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016288748010993004
Validation loss = 0.01789623312652111
Validation loss = 0.015184223651885986
Validation loss = 0.016119543462991714
Validation loss = 0.013918706215918064
Validation loss = 0.01462299469858408
Validation loss = 0.015447086654603481
Validation loss = 0.014359315857291222
Validation loss = 0.015430894680321217
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015812423080205917
Validation loss = 0.016451817005872726
Validation loss = 0.014950771816074848
Validation loss = 0.016112932935357094
Validation loss = 0.01571042649447918
Validation loss = 0.014506409876048565
Validation loss = 0.013998404145240784
Validation loss = 0.01502937264740467
Validation loss = 0.0162420105189085
Validation loss = 0.014645854011178017
Validation loss = 0.0149428341537714
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011737089201877934
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0117096018735363
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011682242990654205
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011655011655011656
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011627906976744186
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01160092807424594
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011574074074074073
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011547344110854504
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01152073732718894
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011494252873563218
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011467889908256881
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011441647597254004
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01141552511415525
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011389521640091117
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011363636363636364
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011337868480725623
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011312217194570135
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011286681715575621
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01126126126126126
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011235955056179775
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011210762331838564
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011185682326621925
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011160714285714286
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011135857461024499
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011111111111111112
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0162  |
| Iteration     | 16       |
| MaximumReturn | -0.00949 |
| MinimumReturn | -0.0249  |
| TotalSamples  | 29988    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015023167245090008
Validation loss = 0.014742816798388958
Validation loss = 0.013928516767919064
Validation loss = 0.015452531166374683
Validation loss = 0.014702926389873028
Validation loss = 0.014515736140310764
Validation loss = 0.015621599741280079
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013964495621621609
Validation loss = 0.015524319373071194
Validation loss = 0.013583741150796413
Validation loss = 0.014826339669525623
Validation loss = 0.01483443845063448
Validation loss = 0.015005229972302914
Validation loss = 0.014294477179646492
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015076362527906895
Validation loss = 0.014490720815956593
Validation loss = 0.014100884087383747
Validation loss = 0.013657884672284126
Validation loss = 0.01608717069029808
Validation loss = 0.01396775059401989
Validation loss = 0.01479525025933981
Validation loss = 0.013894434086978436
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015525441616773605
Validation loss = 0.020076537504792213
Validation loss = 0.01712670549750328
Validation loss = 0.0137745700776577
Validation loss = 0.013567293994128704
Validation loss = 0.016053996980190277
Validation loss = 0.015388410538434982
Validation loss = 0.017150966450572014
Validation loss = 0.014681875705718994
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014967800118029118
Validation loss = 0.014421435073018074
Validation loss = 0.01488177478313446
Validation loss = 0.014770319685339928
Validation loss = 0.014174152165651321
Validation loss = 0.014421355910599232
Validation loss = 0.013867565430700779
Validation loss = 0.01552819088101387
Validation loss = 0.013573913834989071
Validation loss = 0.013688779436051846
Validation loss = 0.01505843922495842
Validation loss = 0.015036375261843204
Validation loss = 0.014875492081046104
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.013303769401330377
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.015486725663716814
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.017660044150110375
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.019823788546255508
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.02197802197802198
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.02412280701754386
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.030634573304157548
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.03275109170305677
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.034858387799564274
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.041304347826086954
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.04338394793926247
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.045454545454545456
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.047516198704103674
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.04956896551724138
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.05161290322580645
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.0536480686695279
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.055674518201284794
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.057692307692307696
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.05970149253731343
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.06170212765957447
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.06369426751592357
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.06567796610169492
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.06765327695560254
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.06962025316455696
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.07368421052631578
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -38.9    |
| Iteration     | 17       |
| MaximumReturn | -7.52    |
| MinimumReturn | -61.7    |
| TotalSamples  | 31654    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017008252441883087
Validation loss = 0.01390829961746931
Validation loss = 0.013750056736171246
Validation loss = 0.013220380060374737
Validation loss = 0.012907447293400764
Validation loss = 0.014314672909677029
Validation loss = 0.01364240050315857
Validation loss = 0.014547864906489849
Validation loss = 0.012864037416875362
Validation loss = 0.013301813043653965
Validation loss = 0.014015094377100468
Validation loss = 0.014099177904427052
Validation loss = 0.01424794178456068
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01438998244702816
Validation loss = 0.015064284205436707
Validation loss = 0.014299539849162102
Validation loss = 0.013451633974909782
Validation loss = 0.012868679128587246
Validation loss = 0.012540267780423164
Validation loss = 0.012606450356543064
Validation loss = 0.012961729429662228
Validation loss = 0.01219620369374752
Validation loss = 0.012651849538087845
Validation loss = 0.013912121765315533
Validation loss = 0.014166912995278835
Validation loss = 0.012348026037216187
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01856144145131111
Validation loss = 0.01271203812211752
Validation loss = 0.013100211508572102
Validation loss = 0.012720182538032532
Validation loss = 0.012898537330329418
Validation loss = 0.011698161251842976
Validation loss = 0.0157255157828331
Validation loss = 0.012147357687354088
Validation loss = 0.01348084770143032
Validation loss = 0.013941912911832333
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014292423613369465
Validation loss = 0.013194054365158081
Validation loss = 0.013133619911968708
Validation loss = 0.012107745744287968
Validation loss = 0.012353774160146713
Validation loss = 0.013710425235331059
Validation loss = 0.013439335860311985
Validation loss = 0.014418701641261578
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016147367656230927
Validation loss = 0.0147083830088377
Validation loss = 0.012739155441522598
Validation loss = 0.012751639820635319
Validation loss = 0.013189100660383701
Validation loss = 0.012499851174652576
Validation loss = 0.012869684025645256
Validation loss = 0.012278447858989239
Validation loss = 0.012747143395245075
Validation loss = 0.013883951120078564
Validation loss = 0.013954772613942623
Validation loss = 0.012977059930562973
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07352941176470588
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07337526205450734
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07322175732217573
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07306889352818371
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07291666666666667
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07276507276507277
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07261410788381743
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07246376811594203
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07231404958677685
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07216494845360824
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0720164609053498
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07186858316221766
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07172131147540983
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07157464212678936
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07142857142857142
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07128309572301425
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07113821138211382
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07099391480730223
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0708502024291498
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0707070707070707
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07056451612903226
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07042253521126761
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07028112449799197
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07014028056112225
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.043   |
| Iteration     | 18       |
| MaximumReturn | -0.0234  |
| MinimumReturn | -0.0672  |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.019202586263418198
Validation loss = 0.019299637526273727
Validation loss = 0.019744135439395905
Validation loss = 0.021263916045427322
Validation loss = 0.017476743087172508
Validation loss = 0.018717825412750244
Validation loss = 0.018579110503196716
Validation loss = 0.01754063554108143
Validation loss = 0.020230606198310852
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017696216702461243
Validation loss = 0.01930161565542221
Validation loss = 0.01705329306423664
Validation loss = 0.017899297177791595
Validation loss = 0.017480533570051193
Validation loss = 0.022191520780324936
Validation loss = 0.017886081710457802
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017104968428611755
Validation loss = 0.017501991242170334
Validation loss = 0.016880089417099953
Validation loss = 0.019520046189427376
Validation loss = 0.016924280673265457
Validation loss = 0.01734491065144539
Validation loss = 0.0196346677839756
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018480606377124786
Validation loss = 0.017531562596559525
Validation loss = 0.017196951434016228
Validation loss = 0.01888274773955345
Validation loss = 0.018864229321479797
Validation loss = 0.018251048400998116
Validation loss = 0.01843263767659664
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01850096508860588
Validation loss = 0.016781264916062355
Validation loss = 0.01823093555867672
Validation loss = 0.020511571317911148
Validation loss = 0.021986015141010284
Validation loss = 0.017267972230911255
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06986027944111776
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0697211155378486
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06958250497017893
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06944444444444445
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06930693069306931
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0691699604743083
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.07100591715976332
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.07677165354330709
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07662082514734773
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.0784313725490196
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.08023483365949119
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.08203125
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08187134502923976
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.08560311284046693
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0854368932038835
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08527131782945736
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0851063829787234
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.08687258687258688
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.0905587668593449
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09230769230769231
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09404990403071017
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09578544061302682
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09560229445506692
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09541984732824428
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09523809523809523
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.76    |
| Iteration     | 19       |
| MaximumReturn | -0.0217  |
| MinimumReturn | -12.1    |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020797083154320717
Validation loss = 0.019508671015501022
Validation loss = 0.025183772668242455
Validation loss = 0.01967127062380314
Validation loss = 0.02207978069782257
Validation loss = 0.01938534341752529
Validation loss = 0.0206734761595726
Validation loss = 0.020622437819838524
Validation loss = 0.019487395882606506
Validation loss = 0.019018519669771194
Validation loss = 0.02092788927257061
Validation loss = 0.02125278115272522
Validation loss = 0.021559827029705048
Validation loss = 0.019473494961857796
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018735075369477272
Validation loss = 0.01947406865656376
Validation loss = 0.018255827948451042
Validation loss = 0.017607158049941063
Validation loss = 0.01898154616355896
Validation loss = 0.017980314791202545
Validation loss = 0.01701250858604908
Validation loss = 0.01865999400615692
Validation loss = 0.018453719094395638
Validation loss = 0.019382823258638382
Validation loss = 0.019713953137397766
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017226072028279305
Validation loss = 0.017880208790302277
Validation loss = 0.017169926315546036
Validation loss = 0.017706824466586113
Validation loss = 0.017377126961946487
Validation loss = 0.019777141511440277
Validation loss = 0.017699990421533585
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020081892609596252
Validation loss = 0.01733800582587719
Validation loss = 0.019133977591991425
Validation loss = 0.019877701997756958
Validation loss = 0.019244002178311348
Validation loss = 0.01795853301882744
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01855742558836937
Validation loss = 0.016840925440192223
Validation loss = 0.017470989376306534
Validation loss = 0.019315816462039948
Validation loss = 0.016290875151753426
Validation loss = 0.019367147237062454
Validation loss = 0.019063998013734818
Validation loss = 0.01741461269557476
Validation loss = 0.018958646804094315
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09505703422053231
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09487666034155598
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0946969696969697
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0945179584120983
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09433962264150944
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09416195856873823
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09398496240601503
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09380863039399624
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09363295880149813
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09345794392523364
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09328358208955224
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0931098696461825
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09293680297397769
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09276437847866419
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09259259259259259
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09242144177449169
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09225092250922509
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09208103130755065
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09191176470588236
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09174311926605505
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09157509157509157
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09140767824497258
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09124087591240876
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09107468123861566
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09090909090909091
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.062   |
| Iteration     | 20       |
| MaximumReturn | -0.0327  |
| MinimumReturn | -0.108   |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02165529690682888
Validation loss = 0.02221926487982273
Validation loss = 0.0207867082208395
Validation loss = 0.020813679322600365
Validation loss = 0.025273140519857407
Validation loss = 0.020244305953383446
Validation loss = 0.021091504022479057
Validation loss = 0.021232692524790764
Validation loss = 0.021437428891658783
Validation loss = 0.020359016954898834
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.019243035465478897
Validation loss = 0.020292900502681732
Validation loss = 0.020898990333080292
Validation loss = 0.020725294947624207
Validation loss = 0.021042250096797943
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022038940340280533
Validation loss = 0.02277015894651413
Validation loss = 0.020698536187410355
Validation loss = 0.020265348255634308
Validation loss = 0.019489338621497154
Validation loss = 0.020051853731274605
Validation loss = 0.019454538822174072
Validation loss = 0.017981911078095436
Validation loss = 0.020505372434854507
Validation loss = 0.01958543248474598
Validation loss = 0.020156405866146088
Validation loss = 0.020131155848503113
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01951579377055168
Validation loss = 0.020063240081071854
Validation loss = 0.020489225164055824
Validation loss = 0.019410528242588043
Validation loss = 0.020671270787715912
Validation loss = 0.01950116641819477
Validation loss = 0.019384704530239105
Validation loss = 0.01889350265264511
Validation loss = 0.019689317792654037
Validation loss = 0.01765303872525692
Validation loss = 0.021936656907200813
Validation loss = 0.019270984455943108
Validation loss = 0.018176613375544548
Validation loss = 0.0212379302829504
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02038826420903206
Validation loss = 0.02014407329261303
Validation loss = 0.021118707954883575
Validation loss = 0.023196693509817123
Validation loss = 0.02042071893811226
Validation loss = 0.020134031772613525
Validation loss = 0.02026338316500187
Validation loss = 0.020085617899894714
Validation loss = 0.018804343417286873
Validation loss = 0.02017747424542904
Validation loss = 0.01896405965089798
Validation loss = 0.02110850065946579
Validation loss = 0.021685056388378143
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09074410163339383
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09057971014492754
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09041591320072333
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09025270758122744
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09009009009009009
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08992805755395683
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08976660682226212
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08960573476702509
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08944543828264759
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08928571428571429
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08912655971479501
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08896797153024912
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08880994671403197
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08865248226950355
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08849557522123894
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08833922261484099
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08818342151675485
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0880281690140845
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08787346221441125
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08771929824561403
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08756567425569177
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08741258741258741
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08726003490401396
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08710801393728224
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08695652173913043
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00233 |
| Iteration     | 21       |
| MaximumReturn | -0.00179 |
| MinimumReturn | -0.00318 |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021184459328651428
Validation loss = 0.020950185135006905
Validation loss = 0.020354347303509712
Validation loss = 0.02102695405483246
Validation loss = 0.02129012905061245
Validation loss = 0.023142103105783463
Validation loss = 0.020776791498064995
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0217779241502285
Validation loss = 0.02099074050784111
Validation loss = 0.01912493072450161
Validation loss = 0.02018097974359989
Validation loss = 0.020832134410738945
Validation loss = 0.02091510407626629
Validation loss = 0.02168447896838188
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01927088014781475
Validation loss = 0.021037345752120018
Validation loss = 0.01961713843047619
Validation loss = 0.019784249365329742
Validation loss = 0.02215435914695263
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020668234676122665
Validation loss = 0.01946338079869747
Validation loss = 0.020695675164461136
Validation loss = 0.020213603973388672
Validation loss = 0.021434517577290535
Validation loss = 0.021256329491734505
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022298868745565414
Validation loss = 0.02063002996146679
Validation loss = 0.020145198330283165
Validation loss = 0.020302532240748405
Validation loss = 0.019642652943730354
Validation loss = 0.01988319680094719
Validation loss = 0.02014940418303013
Validation loss = 0.02184184454381466
Validation loss = 0.020366095006465912
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08680555555555555
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08665511265164645
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08650519031141868
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08635578583765112
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08620689655172414
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08605851979345955
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0859106529209622
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08576329331046312
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08561643835616438
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08547008547008547
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08532423208191127
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08517887563884156
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08503401360544217
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08488964346349745
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0847457627118644
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08460236886632826
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08445945945945946
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08431703204047218
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08417508417508418
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08403361344537816
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08389261744966443
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08375209380234507
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08361204013377926
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08347245409015025
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08333333333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00113  |
| Iteration     | 22        |
| MaximumReturn | -0.000787 |
| MinimumReturn | -0.00164  |
| TotalSamples  | 39984     |
-----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02279275469481945
Validation loss = 0.023215359076857567
Validation loss = 0.021249156445264816
Validation loss = 0.020591866225004196
Validation loss = 0.021821748465299606
Validation loss = 0.02105921134352684
Validation loss = 0.022347334772348404
Validation loss = 0.021900637075304985
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02081167697906494
Validation loss = 0.020922627300024033
Validation loss = 0.021749354898929596
Validation loss = 0.01976003870368004
Validation loss = 0.020839501172304153
Validation loss = 0.01943253166973591
Validation loss = 0.02006903663277626
Validation loss = 0.01929529942572117
Validation loss = 0.01953990012407303
Validation loss = 0.019805485382676125
Validation loss = 0.020554864779114723
Validation loss = 0.021339289844036102
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02025396004319191
Validation loss = 0.021407751366496086
Validation loss = 0.021055473014712334
Validation loss = 0.019959036260843277
Validation loss = 0.019934920594096184
Validation loss = 0.019986722618341446
Validation loss = 0.023089028894901276
Validation loss = 0.020985348150134087
Validation loss = 0.01993442140519619
Validation loss = 0.020387254655361176
Validation loss = 0.020206494256854057
Validation loss = 0.01994841732084751
Validation loss = 0.021205490455031395
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023463038727641106
Validation loss = 0.018850285559892654
Validation loss = 0.020851407200098038
Validation loss = 0.01957903616130352
Validation loss = 0.01895051822066307
Validation loss = 0.019279269501566887
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020065516233444214
Validation loss = 0.019793737679719925
Validation loss = 0.019393686205148697
Validation loss = 0.02259078249335289
Validation loss = 0.021562788635492325
Validation loss = 0.02109169214963913
Validation loss = 0.020192768424749374
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08319467554076539
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08305647840531562
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08291873963515754
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08278145695364239
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08264462809917356
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08250825082508251
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08237232289950576
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08223684210526316
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08210180623973727
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08196721311475409
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08183306055646482
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08169934640522876
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08156606851549755
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08143322475570032
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08130081300813008
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08116883116883117
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08103727714748785
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08090614886731391
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08077544426494346
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08064516129032258
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08051529790660225
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08038585209003216
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08025682182985554
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08012820512820513
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000996 |
| Iteration     | 23        |
| MaximumReturn | -0.000762 |
| MinimumReturn | -0.00157  |
| TotalSamples  | 41650     |
-----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021686609834432602
Validation loss = 0.022224020212888718
Validation loss = 0.02040269784629345
Validation loss = 0.020819375291466713
Validation loss = 0.02257988415658474
Validation loss = 0.02258545160293579
Validation loss = 0.02117534913122654
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021798189729452133
Validation loss = 0.021885624155402184
Validation loss = 0.023505674675107002
Validation loss = 0.020303117111325264
Validation loss = 0.02127535082399845
Validation loss = 0.021385230123996735
Validation loss = 0.02337811328470707
Validation loss = 0.020963894203305244
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0252956785261631
Validation loss = 0.019446197897195816
Validation loss = 0.02116246521472931
Validation loss = 0.021411055698990822
Validation loss = 0.020977001637220383
Validation loss = 0.021951353177428246
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02057642862200737
Validation loss = 0.022582732141017914
Validation loss = 0.02021845057606697
Validation loss = 0.019421089440584183
Validation loss = 0.020208118483424187
Validation loss = 0.020852485671639442
Validation loss = 0.02078690007328987
Validation loss = 0.021050360053777695
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02178141102194786
Validation loss = 0.02112269587814808
Validation loss = 0.02082465961575508
Validation loss = 0.02152334526181221
Validation loss = 0.020980577915906906
Validation loss = 0.020371731370687485
Validation loss = 0.022545887157320976
Validation loss = 0.021343421190977097
Validation loss = 0.019955215975642204
Validation loss = 0.02096541039645672
Validation loss = 0.0224794689565897
Validation loss = 0.021659240126609802
Validation loss = 0.023445991799235344
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07987220447284345
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07974481658692185
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07961783439490445
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0794912559618442
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07936507936507936
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07923930269413629
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07911392405063292
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07898894154818326
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07886435331230283
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07874015748031496
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07861635220125786
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07849293563579278
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07836990595611286
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0782472613458529
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.078125
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.078003120124805
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0778816199376947
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07776049766718507
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07763975155279502
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07751937984496124
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07739938080495357
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07727975270479134
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07716049382716049
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07704160246533127
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07692307692307693
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00363 |
| Iteration     | 24       |
| MaximumReturn | -0.00236 |
| MinimumReturn | -0.00502 |
| TotalSamples  | 43316    |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022263897582888603
Validation loss = 0.02140624262392521
Validation loss = 0.022508740425109863
Validation loss = 0.021397458389401436
Validation loss = 0.01998244784772396
Validation loss = 0.020841412246227264
Validation loss = 0.021416664123535156
Validation loss = 0.020990977063775063
Validation loss = 0.020252572372555733
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022973153740167618
Validation loss = 0.019709985703229904
Validation loss = 0.021908601745963097
Validation loss = 0.02488080970942974
Validation loss = 0.020921900868415833
Validation loss = 0.02087215706706047
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02194470912218094
Validation loss = 0.021956024691462517
Validation loss = 0.021161800250411034
Validation loss = 0.020787397399544716
Validation loss = 0.021451851353049278
Validation loss = 0.023989032953977585
Validation loss = 0.0218990296125412
Validation loss = 0.022242186591029167
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020222866907715797
Validation loss = 0.02065635286271572
Validation loss = 0.020633820444345474
Validation loss = 0.021100418642163277
Validation loss = 0.02050996758043766
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022939415648579597
Validation loss = 0.021208127960562706
Validation loss = 0.021877748891711235
Validation loss = 0.019639350473880768
Validation loss = 0.021751955151557922
Validation loss = 0.022508759051561356
Validation loss = 0.020911430940032005
Validation loss = 0.020490774884819984
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07680491551459294
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07668711656441718
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07656967840735068
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0764525993883792
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07633587786259542
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07621951219512195
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.076103500761035
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07598784194528875
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07587253414264036
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07575757575757576
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07564296520423601
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0755287009063444
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07541478129713423
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07530120481927711
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07518796992481203
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07507507507507508
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07496251874062969
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0748502994011976
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07473841554559044
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07462686567164178
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07451564828614009
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0744047619047619
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07429420505200594
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07418397626112759
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07407407407407407
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00709 |
| Iteration     | 25       |
| MaximumReturn | -0.00476 |
| MinimumReturn | -0.0119  |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021475952118635178
Validation loss = 0.021671786904335022
Validation loss = 0.02046031504869461
Validation loss = 0.022392094135284424
Validation loss = 0.02053135260939598
Validation loss = 0.021339846774935722
Validation loss = 0.02284715697169304
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01987309195101261
Validation loss = 0.02057952806353569
Validation loss = 0.019744006916880608
Validation loss = 0.02137732319533825
Validation loss = 0.020704282447695732
Validation loss = 0.021092601120471954
Validation loss = 0.021637611091136932
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02284793183207512
Validation loss = 0.0200416911393404
Validation loss = 0.01988491788506508
Validation loss = 0.01967039704322815
Validation loss = 0.021568985655903816
Validation loss = 0.021155444905161858
Validation loss = 0.02226998098194599
Validation loss = 0.02329757995903492
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020685851573944092
Validation loss = 0.021792899817228317
Validation loss = 0.020991485565900803
Validation loss = 0.019827691838145256
Validation loss = 0.01907048374414444
Validation loss = 0.020793691277503967
Validation loss = 0.021499739959836006
Validation loss = 0.01972571387887001
Validation loss = 0.01956241764128208
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021090146154165268
Validation loss = 0.019699621945619583
Validation loss = 0.02052440494298935
Validation loss = 0.021533073857426643
Validation loss = 0.022013619542121887
Validation loss = 0.020374244078993797
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07396449704142012
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07385524372230429
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07374631268436578
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07363770250368189
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07352941176470588
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07342143906020558
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07331378299120235
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07320644216691069
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07309941520467836
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.072992700729927
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0728862973760933
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07278020378457059
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07267441860465117
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07256894049346879
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07246376811594203
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0723589001447178
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07225433526011561
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07215007215007214
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07204610951008646
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07194244604316546
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07183908045977011
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07173601147776183
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07163323782234957
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0715307582260372
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07142857142857142
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000952 |
| Iteration     | 26        |
| MaximumReturn | -0.000709 |
| MinimumReturn | -0.00131  |
| TotalSamples  | 46648     |
-----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020582877099514008
Validation loss = 0.02132473513484001
Validation loss = 0.019942542538046837
Validation loss = 0.020855139940977097
Validation loss = 0.019799279049038887
Validation loss = 0.02286343462765217
Validation loss = 0.02117723599076271
Validation loss = 0.019528580829501152
Validation loss = 0.019998915493488312
Validation loss = 0.021633565425872803
Validation loss = 0.02160906419157982
Validation loss = 0.019605806097388268
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020008333027362823
Validation loss = 0.02049478329718113
Validation loss = 0.021110104396939278
Validation loss = 0.020234184339642525
Validation loss = 0.02013208158314228
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022564323619008064
Validation loss = 0.020759031176567078
Validation loss = 0.02131061442196369
Validation loss = 0.021720871329307556
Validation loss = 0.020642893388867378
Validation loss = 0.020589781925082207
Validation loss = 0.022529270499944687
Validation loss = 0.021045448258519173
Validation loss = 0.02041592262685299
Validation loss = 0.020197562873363495
Validation loss = 0.022266363725066185
Validation loss = 0.02396223321557045
Validation loss = 0.021961020305752754
Validation loss = 0.02036452852189541
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019492555409669876
Validation loss = 0.020988810807466507
Validation loss = 0.020947083830833435
Validation loss = 0.01934453286230564
Validation loss = 0.019597122445702553
Validation loss = 0.021936967968940735
Validation loss = 0.02133619785308838
Validation loss = 0.01914539560675621
Validation loss = 0.020236903801560402
Validation loss = 0.020678387954831123
Validation loss = 0.020597204566001892
Validation loss = 0.019787123426795006
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020942093804478645
Validation loss = 0.020004939287900925
Validation loss = 0.02169984206557274
Validation loss = 0.020211465656757355
Validation loss = 0.021254878491163254
Validation loss = 0.018989283591508865
Validation loss = 0.022165190428495407
Validation loss = 0.020070407539606094
Validation loss = 0.01986001431941986
Validation loss = 0.023110587149858475
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07132667617689016
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07122507122507123
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07112375533428165
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07102272727272728
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07092198581560284
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0708215297450425
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07072135785007072
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07062146892655367
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07052186177715092
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07042253521126761
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07032348804500703
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0702247191011236
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07012622720897616
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0700280112044818
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06993006993006994
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06983240223463687
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0697350069735007
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06963788300835655
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06954102920723226
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06944444444444445
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06934812760055478
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06925207756232687
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06915629322268327
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06906077348066299
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06896551724137931
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0186  |
| Iteration     | 27       |
| MaximumReturn | -0.00497 |
| MinimumReturn | -0.265   |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02153145521879196
Validation loss = 0.019969560205936432
Validation loss = 0.02095409296452999
Validation loss = 0.021585384383797646
Validation loss = 0.020719977095723152
Validation loss = 0.02074042521417141
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02119927667081356
Validation loss = 0.0222161915153265
Validation loss = 0.021779514849185944
Validation loss = 0.020167918875813484
Validation loss = 0.021780090406537056
Validation loss = 0.020597126334905624
Validation loss = 0.021036619320511818
Validation loss = 0.021987808868288994
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022517956793308258
Validation loss = 0.022707924246788025
Validation loss = 0.020690130069851875
Validation loss = 0.02200981229543686
Validation loss = 0.021986111998558044
Validation loss = 0.021193740889430046
Validation loss = 0.02174564264714718
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02190735936164856
Validation loss = 0.01979885995388031
Validation loss = 0.021479224786162376
Validation loss = 0.02271178923547268
Validation loss = 0.022484270855784416
Validation loss = 0.020634347572922707
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022078776732087135
Validation loss = 0.02064051479101181
Validation loss = 0.020623261108994484
Validation loss = 0.020223891362547874
Validation loss = 0.020537257194519043
Validation loss = 0.02134079486131668
Validation loss = 0.02079252526164055
Validation loss = 0.021864483132958412
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06887052341597796
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0687757909215956
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06868131868131869
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06858710562414266
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0684931506849315
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06839945280437756
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06830601092896176
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06821282401091405
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0681198910081744
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06802721088435375
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06793478260869565
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06784260515603799
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06775067750677506
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06765899864682003
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06756756756756757
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06747638326585695
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0673854447439353
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06729475100942127
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06720430107526881
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06711409395973154
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06702412868632708
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06693440428380187
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06684491978609626
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06675567423230974
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06666666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00151 |
| Iteration     | 28       |
| MaximumReturn | -0.00102 |
| MinimumReturn | -0.00252 |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021125366911292076
Validation loss = 0.023394258692860603
Validation loss = 0.022373318672180176
Validation loss = 0.022514455020427704
Validation loss = 0.022340629249811172
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021700043231248856
Validation loss = 0.020709840580821037
Validation loss = 0.022483009845018387
Validation loss = 0.020307747647166252
Validation loss = 0.021714840084314346
Validation loss = 0.020795101299881935
Validation loss = 0.020282592624425888
Validation loss = 0.021152693778276443
Validation loss = 0.021277016028761864
Validation loss = 0.021231168881058693
Validation loss = 0.022553274407982826
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021181995049118996
Validation loss = 0.022815294563770294
Validation loss = 0.021307922899723053
Validation loss = 0.021624362096190453
Validation loss = 0.02345045655965805
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022594723850488663
Validation loss = 0.020512476563453674
Validation loss = 0.022819427773356438
Validation loss = 0.021795419976115227
Validation loss = 0.01990697905421257
Validation loss = 0.020614303648471832
Validation loss = 0.02109622210264206
Validation loss = 0.020230844616889954
Validation loss = 0.02230251580476761
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02269461192190647
Validation loss = 0.021026305854320526
Validation loss = 0.0210576131939888
Validation loss = 0.021481327712535858
Validation loss = 0.026832105591893196
Validation loss = 0.020696165040135384
Validation loss = 0.020363319665193558
Validation loss = 0.022203827276825905
Validation loss = 0.021089494228363037
Validation loss = 0.020544936880469322
Validation loss = 0.02360508032143116
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.06790945406125166
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.06914893617021277
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.0703851261620186
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.07161803713527852
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07152317880794702
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.07539682539682539
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07529722589167767
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.07651715039577836
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.077733860342556
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07763157894736843
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07752956636005257
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07742782152230972
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07732634338138926
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.07853403141361257
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0784313725490196
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0783289817232376
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07822685788787484
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.078125
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07802340702210664
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07792207792207792
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07782101167315175
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07772020725388601
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07761966364812418
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.07881136950904392
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07870967741935483
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.45    |
| Iteration     | 29       |
| MaximumReturn | -0.0442  |
| MinimumReturn | -87.1    |
| TotalSamples  | 51646    |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021191803738474846
Validation loss = 0.019786858931183815
Validation loss = 0.019153719767928123
Validation loss = 0.019377201795578003
Validation loss = 0.021150216460227966
Validation loss = 0.02034715563058853
Validation loss = 0.02020052634179592
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02231035754084587
Validation loss = 0.021675972267985344
Validation loss = 0.01830282248556614
Validation loss = 0.02023618295788765
Validation loss = 0.020878350362181664
Validation loss = 0.021895160898566246
Validation loss = 0.021970603615045547
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020830530673265457
Validation loss = 0.021852916106581688
Validation loss = 0.022817401215434074
Validation loss = 0.020367179065942764
Validation loss = 0.020199129357933998
Validation loss = 0.020252099260687828
Validation loss = 0.0204814150929451
Validation loss = 0.02049904130399227
Validation loss = 0.021788807585835457
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019836964085698128
Validation loss = 0.02080393023788929
Validation loss = 0.02047080732882023
Validation loss = 0.02075999788939953
Validation loss = 0.02339310199022293
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02017842046916485
Validation loss = 0.02054748311638832
Validation loss = 0.02146902121603489
Validation loss = 0.018966753035783768
Validation loss = 0.020260527729988098
Validation loss = 0.01834973320364952
Validation loss = 0.020680369809269905
Validation loss = 0.020342223346233368
Validation loss = 0.020490624010562897
Validation loss = 0.01961834914982319
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07860824742268041
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07850707850707851
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07840616966580977
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07830551989730423
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0782051282051282
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07810499359795134
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07800511508951406
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07790549169859515
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0778061224489796
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07770700636942675
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07760814249363868
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07750952986022872
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07741116751269035
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07731305449936629
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07721518987341772
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07711757269279393
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07702020202020202
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07692307692307693
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07682619647355164
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07672955974842767
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07663316582914573
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07653701380175659
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07644110275689223
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07634543178973717
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07625
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00341 |
| Iteration     | 30       |
| MaximumReturn | -0.00256 |
| MinimumReturn | -0.00475 |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024237560108304024
Validation loss = 0.024714600294828415
Validation loss = 0.022853033617138863
Validation loss = 0.023582516238093376
Validation loss = 0.025864986702799797
Validation loss = 0.02575603313744068
Validation loss = 0.024142952635884285
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024308236315846443
Validation loss = 0.024070443585515022
Validation loss = 0.02501758001744747
Validation loss = 0.023324666544795036
Validation loss = 0.024201180785894394
Validation loss = 0.02393750287592411
Validation loss = 0.025197379291057587
Validation loss = 0.0248426366597414
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.026711655780673027
Validation loss = 0.02350938320159912
Validation loss = 0.024134192615747452
Validation loss = 0.02615770511329174
Validation loss = 0.024194011464715004
Validation loss = 0.022823680192232132
Validation loss = 0.025787996128201485
Validation loss = 0.023421775549650192
Validation loss = 0.02485460788011551
Validation loss = 0.024027226492762566
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024874983355402946
Validation loss = 0.021834781393408775
Validation loss = 0.024771878495812416
Validation loss = 0.024168360978364944
Validation loss = 0.024669550359249115
Validation loss = 0.02392805553972721
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024922925978899002
Validation loss = 0.023893466219305992
Validation loss = 0.02308332920074463
Validation loss = 0.025212544947862625
Validation loss = 0.022693833336234093
Validation loss = 0.022879641503095627
Validation loss = 0.02395201101899147
Validation loss = 0.023872513324022293
Validation loss = 0.023610560223460197
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07615480649188515
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07605985037406483
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0759651307596513
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07587064676616916
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07577639751552795
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07568238213399504
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07558859975216853
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07549504950495049
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0754017305315204
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07530864197530865
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0752157829839704
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07512315270935961
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07503075030750307
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07493857493857493
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07484662576687116
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07475490196078431
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07466340269277846
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0745721271393643
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07448107448107448
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07439024390243902
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07429963459196103
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07420924574209246
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07411907654921021
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07402912621359223
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07393939393939394
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00105  |
| Iteration     | 31        |
| MaximumReturn | -0.000777 |
| MinimumReturn | -0.00141  |
| TotalSamples  | 54978     |
-----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02438320219516754
Validation loss = 0.02577860839664936
Validation loss = 0.024413809180259705
Validation loss = 0.024477489292621613
Validation loss = 0.024918071925640106
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.026658328250050545
Validation loss = 0.02287350967526436
Validation loss = 0.023299584165215492
Validation loss = 0.024635136127471924
Validation loss = 0.022849729284644127
Validation loss = 0.024245737120509148
Validation loss = 0.024014582857489586
Validation loss = 0.024477891623973846
Validation loss = 0.02370726317167282
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02435009367763996
Validation loss = 0.0259618628770113
Validation loss = 0.02405654452741146
Validation loss = 0.02364005707204342
Validation loss = 0.02815295197069645
Validation loss = 0.025827178731560707
Validation loss = 0.02396484650671482
Validation loss = 0.024242762476205826
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025675497949123383
Validation loss = 0.023215332999825478
Validation loss = 0.023409677669405937
Validation loss = 0.022155705839395523
Validation loss = 0.02341303601861
Validation loss = 0.023794099688529968
Validation loss = 0.023529812693595886
Validation loss = 0.022987885400652885
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02403021790087223
Validation loss = 0.023550329729914665
Validation loss = 0.025174329057335854
Validation loss = 0.023253723978996277
Validation loss = 0.023402191698551178
Validation loss = 0.022367525845766068
Validation loss = 0.02331092767417431
Validation loss = 0.02354367822408676
Validation loss = 0.023891635239124298
Validation loss = 0.023772601038217545
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07384987893462469
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07376058041112454
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07367149758454106
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0735826296743064
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07349397590361446
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07340553549939831
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0733173076923077
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07322929171668667
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07314148681055156
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07305389221556886
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0729665071770335
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07287933094384708
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07279236276849642
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07270560190703218
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07261904761904762
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07253269916765755
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07244655581947744
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07236061684460261
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07227488151658767
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07218934911242604
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07210401891252956
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07201889020070838
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07193396226415094
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.071849234393404
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07176470588235294
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00106  |
| Iteration     | 32        |
| MaximumReturn | -0.000729 |
| MinimumReturn | -0.00171  |
| TotalSamples  | 56644     |
-----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.027333183214068413
Validation loss = 0.024745984002947807
Validation loss = 0.024134596809744835
Validation loss = 0.02527521923184395
Validation loss = 0.024983644485473633
Validation loss = 0.024927273392677307
Validation loss = 0.02481132186949253
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02441285364329815
Validation loss = 0.025817707180976868
Validation loss = 0.02996155619621277
Validation loss = 0.02335323952138424
Validation loss = 0.024097148329019547
Validation loss = 0.02413828670978546
Validation loss = 0.02308671735227108
Validation loss = 0.023341679945588112
Validation loss = 0.025157960131764412
Validation loss = 0.023511691018939018
Validation loss = 0.023019341751933098
Validation loss = 0.02334831841289997
Validation loss = 0.02327031083405018
Validation loss = 0.026208583265542984
Validation loss = 0.02536691725254059
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025332601740956306
Validation loss = 0.024575363844633102
Validation loss = 0.02591785602271557
Validation loss = 0.024737436324357986
Validation loss = 0.024794865399599075
Validation loss = 0.023704910650849342
Validation loss = 0.02371910773217678
Validation loss = 0.024678876623511314
Validation loss = 0.023583142086863518
Validation loss = 0.024141713976860046
Validation loss = 0.02296685241162777
Validation loss = 0.024133305996656418
Validation loss = 0.02509872429072857
Validation loss = 0.02524845115840435
Validation loss = 0.022957060486078262
Validation loss = 0.02543552592396736
Validation loss = 0.02232096716761589
Validation loss = 0.024324510246515274
Validation loss = 0.026879925280809402
Validation loss = 0.02443113550543785
Validation loss = 0.02491704747080803
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.026148874312639236
Validation loss = 0.022684162482619286
Validation loss = 0.025152534246444702
Validation loss = 0.022744327783584595
Validation loss = 0.023843416944146156
Validation loss = 0.024343106895685196
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025920310989022255
Validation loss = 0.02439863421022892
Validation loss = 0.022924428805708885
Validation loss = 0.023045344278216362
Validation loss = 0.024193208664655685
Validation loss = 0.02374066226184368
Validation loss = 0.022907840088009834
Validation loss = 0.025511011481285095
Validation loss = 0.023018570616841316
Validation loss = 0.02397422306239605
Validation loss = 0.024313930422067642
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07168037602820211
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0715962441314554
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07151230949589683
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07142857142857142
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07134502923976609
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07126168224299065
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07117852975495916
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07109557109557109
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0710128055878929
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07093023255813953
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07084785133565621
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07076566125290024
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07068366164542295
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07060185185185185
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07052023121387284
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07043879907621248
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07035755478662054
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07027649769585254
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07019562715765247
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07011494252873564
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07003444316877153
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06995412844036697
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06987399770904926
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06979405034324943
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06971428571428571
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00364 |
| Iteration     | 33       |
| MaximumReturn | -0.00252 |
| MinimumReturn | -0.00525 |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023396046832203865
Validation loss = 0.027127942070364952
Validation loss = 0.024044668301939964
Validation loss = 0.023838846012949944
Validation loss = 0.025506246834993362
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.025101250037550926
Validation loss = 0.023348437622189522
Validation loss = 0.023193661123514175
Validation loss = 0.025064291432499886
Validation loss = 0.02381761744618416
Validation loss = 0.023585086688399315
Validation loss = 0.023803487420082092
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02589559182524681
Validation loss = 0.02493952587246895
Validation loss = 0.02502600848674774
Validation loss = 0.02505531534552574
Validation loss = 0.024077946320176125
Validation loss = 0.024359866976737976
Validation loss = 0.02762114629149437
Validation loss = 0.02547368034720421
Validation loss = 0.025191957131028175
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02399938926100731
Validation loss = 0.02246328443288803
Validation loss = 0.023110145702958107
Validation loss = 0.023095859214663506
Validation loss = 0.02499629743397236
Validation loss = 0.023106016218662262
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02282240055501461
Validation loss = 0.023536918684840202
Validation loss = 0.023969970643520355
Validation loss = 0.025383900851011276
Validation loss = 0.02541229873895645
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06963470319634703
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06955530216647662
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06947608200455581
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06939704209328783
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06931818181818182
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06923950056753689
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0691609977324263
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06908267270668177
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06900452488687783
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06892655367231638
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06884875846501129
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06877113866967305
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0686936936936937
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0686164229471316
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06853932584269663
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06846240179573512
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06838565022421525
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0683090705487122
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06823266219239374
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06815642458100558
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06808035714285714
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06800445930880714
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06792873051224944
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.067853170189099
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06777777777777778
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00108  |
| Iteration     | 34        |
| MaximumReturn | -0.000728 |
| MinimumReturn | -0.00152  |
| TotalSamples  | 59976     |
-----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023015087470412254
Validation loss = 0.02387220598757267
Validation loss = 0.02398657239973545
Validation loss = 0.02251681126654148
Validation loss = 0.023631446063518524
Validation loss = 0.026044823229312897
Validation loss = 0.02350633032619953
Validation loss = 0.024737676605582237
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02374395914375782
Validation loss = 0.02289494127035141
Validation loss = 0.025450801476836205
Validation loss = 0.023170769214630127
Validation loss = 0.02301795594394207
Validation loss = 0.02324126474559307
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023402314633131027
Validation loss = 0.024129174649715424
Validation loss = 0.023322569206357002
Validation loss = 0.02497497946023941
Validation loss = 0.024086060002446175
Validation loss = 0.022005848586559296
Validation loss = 0.022277498617768288
Validation loss = 0.022804267704486847
Validation loss = 0.023259157314896584
Validation loss = 0.025688597932457924
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023250354453921318
Validation loss = 0.023045258596539497
Validation loss = 0.022290904074907303
Validation loss = 0.022165704518556595
Validation loss = 0.02240707352757454
Validation loss = 0.023110756650567055
Validation loss = 0.023054474964737892
Validation loss = 0.02177256904542446
Validation loss = 0.022271310910582542
Validation loss = 0.023186150938272476
Validation loss = 0.026348376646637917
Validation loss = 0.02293029986321926
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023937884718179703
Validation loss = 0.023530960083007812
Validation loss = 0.02442268840968609
Validation loss = 0.023653652518987656
Validation loss = 0.024529822170734406
Validation loss = 0.024175653234124184
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06770255271920089
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06762749445676275
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06755260243632337
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06747787610619468
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06740331491712707
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0673289183222958
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06725468577728776
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0671806167400881
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0671067106710671
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06703296703296703
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06695938529088913
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0668859649122807
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06681270536692223
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06673960612691467
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06666666666666667
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0665938864628821
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06652126499454744
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0664488017429194
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06637649619151251
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06630434782608696
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06623235613463627
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06616052060737528
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06608884073672806
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06601731601731602
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06594594594594595
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00101 |
| Iteration     | 35       |
| MaximumReturn | -0.00067 |
| MinimumReturn | -0.00141 |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02310933731496334
Validation loss = 0.025374578312039375
Validation loss = 0.024446070194244385
Validation loss = 0.02431117184460163
Validation loss = 0.026269936934113503
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021370980888605118
Validation loss = 0.02493966929614544
Validation loss = 0.024921506643295288
Validation loss = 0.02345634438097477
Validation loss = 0.024380674585700035
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02687549591064453
Validation loss = 0.02408677153289318
Validation loss = 0.024592209607362747
Validation loss = 0.02486417442560196
Validation loss = 0.02576087974011898
Validation loss = 0.024533886462450027
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024151895195245743
Validation loss = 0.026120325550436974
Validation loss = 0.02382354810833931
Validation loss = 0.021827882155776024
Validation loss = 0.026096973568201065
Validation loss = 0.02386154979467392
Validation loss = 0.02416636422276497
Validation loss = 0.022935716435313225
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024685867130756378
Validation loss = 0.02517799660563469
Validation loss = 0.023216545581817627
Validation loss = 0.025086751207709312
Validation loss = 0.0247579924762249
Validation loss = 0.024188704788684845
Validation loss = 0.022818563506007195
Validation loss = 0.02341853268444538
Validation loss = 0.022598041221499443
Validation loss = 0.02457340620458126
Validation loss = 0.023874904960393906
Validation loss = 0.02314450591802597
Validation loss = 0.02542921155691147
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06587473002159827
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06580366774541532
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06573275862068965
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06566200215285253
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06559139784946237
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06552094522019335
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06545064377682404
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06538049303322616
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06531049250535331
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06524064171122995
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06517094017094018
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06510138740661686
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0650319829424307
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06496272630457935
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06489361702127659
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06482465462274177
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06475583864118896
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06468716861081654
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0646186440677966
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06455026455026455
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06448202959830866
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06441393875395987
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06434599156118144
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0642781875658588
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06421052631578947
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00315 |
| Iteration     | 36       |
| MaximumReturn | -0.0024  |
| MinimumReturn | -0.00464 |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02480865828692913
Validation loss = 0.02581661380827427
Validation loss = 0.025024814531207085
Validation loss = 0.024417607113718987
Validation loss = 0.023759108036756516
Validation loss = 0.02329372428357601
Validation loss = 0.026098964735865593
Validation loss = 0.024813704192638397
Validation loss = 0.02316427417099476
Validation loss = 0.023493463173508644
Validation loss = 0.02420646883547306
Validation loss = 0.023120472207665443
Validation loss = 0.02369512803852558
Validation loss = 0.02296333946287632
Validation loss = 0.02312934398651123
Validation loss = 0.02447049506008625
Validation loss = 0.022838246077299118
Validation loss = 0.02616090700030327
Validation loss = 0.024081511422991753
Validation loss = 0.023398861289024353
Validation loss = 0.024218060076236725
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023575833067297935
Validation loss = 0.024335000663995743
Validation loss = 0.023982780054211617
Validation loss = 0.02464667521417141
Validation loss = 0.022622596472501755
Validation loss = 0.022546473890542984
Validation loss = 0.023785321041941643
Validation loss = 0.024060146883130074
Validation loss = 0.02446802705526352
Validation loss = 0.021970583125948906
Validation loss = 0.023381307721138
Validation loss = 0.022749541327357292
Validation loss = 0.023098871111869812
Validation loss = 0.02514387108385563
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024750689044594765
Validation loss = 0.023979084566235542
Validation loss = 0.022179879248142242
Validation loss = 0.030025599524378777
Validation loss = 0.02359667979180813
Validation loss = 0.02569797821342945
Validation loss = 0.025058556348085403
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0229507889598608
Validation loss = 0.024057095870375633
Validation loss = 0.022907357662916183
Validation loss = 0.023857982829213142
Validation loss = 0.023943468928337097
Validation loss = 0.024795927107334137
Validation loss = 0.022810284048318863
Validation loss = 0.02373419888317585
Validation loss = 0.023491082713007927
Validation loss = 0.024202421307563782
Validation loss = 0.025091132149100304
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02280702069401741
Validation loss = 0.02382366918027401
Validation loss = 0.028070786967873573
Validation loss = 0.02461102232336998
Validation loss = 0.023304879665374756
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.06624605678233439
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0661764705882353
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06610703043022036
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.06813417190775681
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.06910994764397906
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06903765690376569
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06896551724137931
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.06993736951983298
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06986444212721585
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06979166666666667
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.07284079084287201
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07276507276507277
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07268951194184839
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07261410788381743
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.07357512953367876
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07349896480331262
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0734229576008273
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07334710743801653
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07327141382868937
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0731958762886598
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.07415036045314109
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07407407407407407
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07399794450154162
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07392197125256673
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07384615384615385
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.9     |
| Iteration     | 37       |
| MaximumReturn | -0.0317  |
| MinimumReturn | -62.5    |
| TotalSamples  | 64974    |
----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0221259668469429
Validation loss = 0.024179372936487198
Validation loss = 0.024486836045980453
Validation loss = 0.02216978371143341
Validation loss = 0.022684291005134583
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023298978805541992
Validation loss = 0.02222478948533535
Validation loss = 0.022124527022242546
Validation loss = 0.02315264195203781
Validation loss = 0.02497587725520134
Validation loss = 0.022383159026503563
Validation loss = 0.022779155522584915
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025459928438067436
Validation loss = 0.024007262662053108
Validation loss = 0.026431767269968987
Validation loss = 0.02398953214287758
Validation loss = 0.021981243044137955
Validation loss = 0.023597046732902527
Validation loss = 0.023345019668340683
Validation loss = 0.022658633068203926
Validation loss = 0.023696262389421463
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021625470370054245
Validation loss = 0.022146906703710556
Validation loss = 0.023224014788866043
Validation loss = 0.02703377790749073
Validation loss = 0.022025540471076965
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022301219403743744
Validation loss = 0.022214587777853012
Validation loss = 0.021779056638479233
Validation loss = 0.02387089654803276
Validation loss = 0.021273259073495865
Validation loss = 0.02366768941283226
Validation loss = 0.022895388305187225
Validation loss = 0.02325543761253357
Validation loss = 0.022244902327656746
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07377049180327869
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0736949846468782
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0736196319018405
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0735444330949949
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07346938775510205
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07339449541284404
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07331975560081466
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07324516785350967
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07317073170731707
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07309644670050762
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07302231237322515
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0729483282674772
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0728744939271255
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07280080889787664
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07272727272727272
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07265388496468214
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07258064516129033
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07250755287009064
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07243460764587525
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07236180904522613
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07228915662650602
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07221664994984955
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07214428857715431
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07207207207207207
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.072
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.041   |
| Iteration     | 38       |
| MaximumReturn | -0.0211  |
| MinimumReturn | -0.0723  |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023506123572587967
Validation loss = 0.023791538551449776
Validation loss = 0.022527694702148438
Validation loss = 0.02485877275466919
Validation loss = 0.02370787225663662
Validation loss = 0.023987025022506714
Validation loss = 0.023785561323165894
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024481909349560738
Validation loss = 0.02377079613506794
Validation loss = 0.024100640788674355
Validation loss = 0.026211638003587723
Validation loss = 0.022429082542657852
Validation loss = 0.023157885298132896
Validation loss = 0.023946786299347878
Validation loss = 0.02385495789349079
Validation loss = 0.024752240628004074
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024881700053811073
Validation loss = 0.02529415488243103
Validation loss = 0.02397744171321392
Validation loss = 0.02426491305232048
Validation loss = 0.026668310165405273
Validation loss = 0.023898379877209663
Validation loss = 0.02387162297964096
Validation loss = 0.02518092840909958
Validation loss = 0.0239665936678648
Validation loss = 0.024625264108181
Validation loss = 0.024910252541303635
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02419828251004219
Validation loss = 0.024600859731435776
Validation loss = 0.025178151205182076
Validation loss = 0.022456267848610878
Validation loss = 0.02452877163887024
Validation loss = 0.02423625998198986
Validation loss = 0.024007398635149002
Validation loss = 0.023961739614605904
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023658253252506256
Validation loss = 0.025874465703964233
Validation loss = 0.025655610486865044
Validation loss = 0.023229924961924553
Validation loss = 0.02400122582912445
Validation loss = 0.024305539205670357
Validation loss = 0.02357874996960163
Validation loss = 0.024187825620174408
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07192807192807193
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0718562874251497
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07178464606181456
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07171314741035857
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07164179104477612
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07157057654075547
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07149950347567031
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07142857142857142
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07135777998017839
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07128712871287128
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0712166172106825
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07114624505928854
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07107601184600197
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07100591715976332
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07093596059113301
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07086614173228346
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07079646017699115
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07072691552062868
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07065750736015702
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07058823529411765
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07051909892262488
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07045009784735812
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07038123167155426
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0703125
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0702439024390244
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00235 |
| Iteration     | 39       |
| MaximumReturn | -0.00163 |
| MinimumReturn | -0.00343 |
| TotalSamples  | 68306    |
----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024072466418147087
Validation loss = 0.022889837622642517
Validation loss = 0.024423232302069664
Validation loss = 0.02368440292775631
Validation loss = 0.023558316752314568
Validation loss = 0.024019908159971237
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023490499705076218
Validation loss = 0.021947944536805153
Validation loss = 0.023385269567370415
Validation loss = 0.02485967054963112
Validation loss = 0.023768464103341103
Validation loss = 0.022293493151664734
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023109549656510353
Validation loss = 0.023658687248826027
Validation loss = 0.02414785325527191
Validation loss = 0.02591612935066223
Validation loss = 0.023377075791358948
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025700710713863373
Validation loss = 0.02272455394268036
Validation loss = 0.024301810190081596
Validation loss = 0.023116586729884148
Validation loss = 0.023046359419822693
Validation loss = 0.023711347952485085
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02280273288488388
Validation loss = 0.023562822490930557
Validation loss = 0.023166103288531303
Validation loss = 0.024537738412618637
Validation loss = 0.02261531352996826
Validation loss = 0.022967500612139702
Validation loss = 0.023605749011039734
Validation loss = 0.023877296596765518
Validation loss = 0.023671701550483704
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07017543859649122
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07010710808179163
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07003891050583658
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06997084548104957
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06990291262135923
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06983511154219205
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06976744186046512
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0696999031945789
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06963249516441006
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06956521739130435
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0694980694980695
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06943105110896818
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06936416184971098
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06929740134744947
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06923076923076923
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.069164265129683
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0690978886756238
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06903163950143816
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06896551724137931
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06889952153110047
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06883365200764818
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06876790830945559
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06870229007633588
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06863679694947569
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06857142857142857
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00108  |
| Iteration     | 40        |
| MaximumReturn | -0.000782 |
| MinimumReturn | -0.00147  |
| TotalSamples  | 69972     |
-----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02368302457034588
Validation loss = 0.024223437532782555
Validation loss = 0.02486611343920231
Validation loss = 0.023008352145552635
Validation loss = 0.023364130407571793
Validation loss = 0.023485690355300903
Validation loss = 0.02580312080681324
Validation loss = 0.023824196308851242
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0233357772231102
Validation loss = 0.02484877035021782
Validation loss = 0.02278365194797516
Validation loss = 0.024757226929068565
Validation loss = 0.0245137307792902
Validation loss = 0.02320975996553898
Validation loss = 0.02278352715075016
Validation loss = 0.02489789016544819
Validation loss = 0.02530410885810852
Validation loss = 0.023023033514618874
Validation loss = 0.021811049431562424
Validation loss = 0.0244805496186018
Validation loss = 0.022316521033644676
Validation loss = 0.023094551637768745
Validation loss = 0.02212049625813961
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.026181980967521667
Validation loss = 0.026266489177942276
Validation loss = 0.02454119361937046
Validation loss = 0.025762004777789116
Validation loss = 0.023378843441605568
Validation loss = 0.026647863909602165
Validation loss = 0.02519354782998562
Validation loss = 0.02536497823894024
Validation loss = 0.023079873993992805
Validation loss = 0.024291252717375755
Validation loss = 0.023956075310707092
Validation loss = 0.023603923618793488
Validation loss = 0.023676380515098572
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024132203310728073
Validation loss = 0.024926872923970222
Validation loss = 0.02363433688879013
Validation loss = 0.02569577284157276
Validation loss = 0.023582400754094124
Validation loss = 0.02298402413725853
Validation loss = 0.02441331557929516
Validation loss = 0.023266088217496872
Validation loss = 0.02234037034213543
Validation loss = 0.023621780797839165
Validation loss = 0.023925332352519035
Validation loss = 0.02431255206465721
Validation loss = 0.023793039843440056
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02350638061761856
Validation loss = 0.02442602813243866
Validation loss = 0.023223698139190674
Validation loss = 0.023565316572785378
Validation loss = 0.022137394174933434
Validation loss = 0.02188901975750923
Validation loss = 0.023644747212529182
Validation loss = 0.021578136831521988
Validation loss = 0.02432299591600895
Validation loss = 0.023102033883333206
Validation loss = 0.024107856675982475
Validation loss = 0.02162851393222809
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06850618458610847
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06844106463878327
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06837606837606838
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0683111954459203
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06824644549763033
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06818181818181818
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06811731315042574
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06805293005671077
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0679886685552408
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06792452830188679
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06786050895381715
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06779661016949153
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06773283160865474
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06766917293233082
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0676056338028169
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0675422138836773
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06747891283973759
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06741573033707865
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06735266604303088
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06728971962616823
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06722689075630252
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06716417910447761
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06710158434296365
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0670391061452514
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06697674418604652
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00104  |
| Iteration     | 41        |
| MaximumReturn | -0.000708 |
| MinimumReturn | -0.00143  |
| TotalSamples  | 71638     |
-----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02366628497838974
Validation loss = 0.02362091839313507
Validation loss = 0.023500047624111176
Validation loss = 0.024925662204623222
Validation loss = 0.02389335259795189
Validation loss = 0.02437666244804859
Validation loss = 0.025512779131531715
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02348799630999565
Validation loss = 0.023382237181067467
Validation loss = 0.025270795449614525
Validation loss = 0.026273220777511597
Validation loss = 0.022596867755055428
Validation loss = 0.02296224609017372
Validation loss = 0.02363470010459423
Validation loss = 0.023183664306998253
Validation loss = 0.024711085483431816
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025069881230592728
Validation loss = 0.023154111579060555
Validation loss = 0.025670606642961502
Validation loss = 0.024236785247921944
Validation loss = 0.02370716817677021
Validation loss = 0.025616182014346123
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02417646162211895
Validation loss = 0.024517247453331947
Validation loss = 0.02409457415342331
Validation loss = 0.02523753046989441
Validation loss = 0.0236526969820261
Validation loss = 0.025096306577324867
Validation loss = 0.025888068601489067
Validation loss = 0.02427002228796482
Validation loss = 0.023459045216441154
Validation loss = 0.0256668534129858
Validation loss = 0.025347832590341568
Validation loss = 0.022789817303419113
Validation loss = 0.023958731442689896
Validation loss = 0.023228028789162636
Validation loss = 0.023314611986279488
Validation loss = 0.024752270430326462
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02449735254049301
Validation loss = 0.025306815281510353
Validation loss = 0.02411549724638462
Validation loss = 0.024042900651693344
Validation loss = 0.023078041151165962
Validation loss = 0.025053657591342926
Validation loss = 0.02516895905137062
Validation loss = 0.023958301171660423
Validation loss = 0.02278873324394226
Validation loss = 0.025206148624420166
Validation loss = 0.02432973124086857
Validation loss = 0.023053739219903946
Validation loss = 0.024174895137548447
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06691449814126393
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06685236768802229
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06679035250463822
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06672845227062095
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06666666666666667
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0666049953746531
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.066543438077634
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0664819944598338
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06642066420664207
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0663594470046083
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06629834254143646
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06623735050597976
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0661764705882353
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06611570247933884
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06605504587155964
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06599450045829514
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06593406593406594
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06587374199451052
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06581352833638025
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06575342465753424
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06569343065693431
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06563354603463993
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06557377049180328
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06551410373066424
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06545454545454546
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000991 |
| Iteration     | 42        |
| MaximumReturn | -0.000805 |
| MinimumReturn | -0.00132  |
| TotalSamples  | 73304     |
-----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024324942380189896
Validation loss = 0.023350931704044342
Validation loss = 0.022777138277888298
Validation loss = 0.024845384061336517
Validation loss = 0.024360641837120056
Validation loss = 0.024142693728208542
Validation loss = 0.024806883186101913
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024143259972333908
Validation loss = 0.022543780505657196
Validation loss = 0.025078555569052696
Validation loss = 0.022518906742334366
Validation loss = 0.023017380386590958
Validation loss = 0.023523220792412758
Validation loss = 0.02316775731742382
Validation loss = 0.023990361019968987
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02406175434589386
Validation loss = 0.027032971382141113
Validation loss = 0.024227121844887733
Validation loss = 0.02466801181435585
Validation loss = 0.024718495085835457
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022896744310855865
Validation loss = 0.025359055027365685
Validation loss = 0.023186195641756058
Validation loss = 0.026403946802020073
Validation loss = 0.024544991552829742
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024871883913874626
Validation loss = 0.026176713407039642
Validation loss = 0.024892102926969528
Validation loss = 0.024208175018429756
Validation loss = 0.023764265701174736
Validation loss = 0.023128004744648933
Validation loss = 0.023128783330321312
Validation loss = 0.024670545011758804
Validation loss = 0.02363092079758644
Validation loss = 0.023109640926122665
Validation loss = 0.024276893585920334
Validation loss = 0.02378627471625805
Validation loss = 0.02623528428375721
Validation loss = 0.0223704781383276
Validation loss = 0.022516904398798943
Validation loss = 0.02425830438733101
Validation loss = 0.02478793077170849
Validation loss = 0.022762717679142952
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0653950953678474
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06533575317604355
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06527651858567543
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06521739130434782
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06515837104072399
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0650994575045208
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06504065040650407
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06498194945848375
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06492335437330929
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06486486486486487
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06480648064806481
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06474820143884892
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0646900269541779
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06463195691202872
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06457399103139014
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06451612903225806
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06445837063563116
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06440071556350627
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.064343163538874
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06428571428571428
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06422836752899197
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06417112299465241
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06411398040961709
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06405693950177936
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.064
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00102  |
| Iteration     | 43        |
| MaximumReturn | -0.000662 |
| MinimumReturn | -0.00167  |
| TotalSamples  | 74970     |
-----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02396363951265812
Validation loss = 0.024370629340410233
Validation loss = 0.02531537041068077
Validation loss = 0.02660006284713745
Validation loss = 0.024569453671574593
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023980161175131798
Validation loss = 0.02288343571126461
Validation loss = 0.022938968613743782
Validation loss = 0.022136110812425613
Validation loss = 0.024307627230882645
Validation loss = 0.022073786705732346
Validation loss = 0.023104041814804077
Validation loss = 0.023542184382677078
Validation loss = 0.023254726082086563
Validation loss = 0.02248726226389408
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025256622582674026
Validation loss = 0.024503929540514946
Validation loss = 0.024300236254930496
Validation loss = 0.024992946535348892
Validation loss = 0.024593690410256386
Validation loss = 0.027291810140013695
Validation loss = 0.02572011575102806
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024757422506809235
Validation loss = 0.02453971654176712
Validation loss = 0.02321571670472622
Validation loss = 0.024233436211943626
Validation loss = 0.023946326225996017
Validation loss = 0.024750730022788048
Validation loss = 0.02592734433710575
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02319738268852234
Validation loss = 0.025286750867962837
Validation loss = 0.023669511079788208
Validation loss = 0.023902012035250664
Validation loss = 0.022843949496746063
Validation loss = 0.022889507934451103
Validation loss = 0.024667102843523026
Validation loss = 0.023507723584771156
Validation loss = 0.02301645651459694
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06394316163410302
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06388642413487133
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06382978723404255
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0637732506643047
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06371681415929203
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0636604774535809
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0636042402826855
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06354810238305383
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06349206349206349
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06343612334801763
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06338028169014084
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0633245382585752
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0632688927943761
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06321334503950835
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06315789473684211
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06310254163014899
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06304728546409807
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06299212598425197
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06293706293706294
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.062882096069869
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06282722513089005
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06277244986922406
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0627177700348432
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06266318537859007
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06260869565217392
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00112 |
| Iteration     | 44       |
| MaximumReturn | -0.00077 |
| MinimumReturn | -0.00135 |
| TotalSamples  | 76636    |
----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024133985862135887
Validation loss = 0.02434580959379673
Validation loss = 0.02298225834965706
Validation loss = 0.024079997092485428
Validation loss = 0.02410631626844406
Validation loss = 0.02431543916463852
Validation loss = 0.024319477379322052
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022910643368959427
Validation loss = 0.02340012416243553
Validation loss = 0.02354245074093342
Validation loss = 0.022243214771151543
Validation loss = 0.02204941213130951
Validation loss = 0.022715305909514427
Validation loss = 0.02207476831972599
Validation loss = 0.022270744666457176
Validation loss = 0.023000359535217285
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02434772625565529
Validation loss = 0.027453701943159103
Validation loss = 0.024142058566212654
Validation loss = 0.023979807272553444
Validation loss = 0.025503136217594147
Validation loss = 0.025758063420653343
Validation loss = 0.024289129301905632
Validation loss = 0.025498660281300545
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02306666411459446
Validation loss = 0.023078342899680138
Validation loss = 0.022334828972816467
Validation loss = 0.02270449697971344
Validation loss = 0.02285831607878208
Validation loss = 0.02209760807454586
Validation loss = 0.024209758266806602
Validation loss = 0.02316448651254177
Validation loss = 0.024498049169778824
Validation loss = 0.022563552483916283
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024271860718727112
Validation loss = 0.02309296280145645
Validation loss = 0.02193645015358925
Validation loss = 0.022849954664707184
Validation loss = 0.02145451493561268
Validation loss = 0.023866886273026466
Validation loss = 0.022842148318886757
Validation loss = 0.02244403026998043
Validation loss = 0.022858018055558205
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06255430060816682
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0625
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.062445793581960105
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06239168110918544
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06233766233766234
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06228373702422145
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06222990492653414
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06217616580310881
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.062122519413287315
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06206896551724138
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06201550387596899
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06196213425129088
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061908856405846945
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061855670103092786
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061802575107296136
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06174957118353345
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061696658097686374
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06164383561643835
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06159110350727117
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06153846153846154
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06148590947907771
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06143344709897611
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061381074168797956
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06132879045996593
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06127659574468085
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00104  |
| Iteration     | 45        |
| MaximumReturn | -0.000653 |
| MinimumReturn | -0.00164  |
| TotalSamples  | 78302     |
-----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024258514866232872
Validation loss = 0.022828957065939903
Validation loss = 0.0241335928440094
Validation loss = 0.025336936116218567
Validation loss = 0.02335701510310173
Validation loss = 0.025067847222089767
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02303399331867695
Validation loss = 0.025244569405913353
Validation loss = 0.02204900234937668
Validation loss = 0.023194342851638794
Validation loss = 0.02261948212981224
Validation loss = 0.022511083632707596
Validation loss = 0.022348176687955856
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02457963116466999
Validation loss = 0.02637484110891819
Validation loss = 0.025193385779857635
Validation loss = 0.026335835456848145
Validation loss = 0.024232938885688782
Validation loss = 0.024463554844260216
Validation loss = 0.026348944753408432
Validation loss = 0.024696949869394302
Validation loss = 0.02493089623749256
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02430175058543682
Validation loss = 0.02554829977452755
Validation loss = 0.022695673629641533
Validation loss = 0.023023663088679314
Validation loss = 0.02237420156598091
Validation loss = 0.024843579158186913
Validation loss = 0.023787574842572212
Validation loss = 0.02513272874057293
Validation loss = 0.02214723639190197
Validation loss = 0.024747945368289948
Validation loss = 0.024474255740642548
Validation loss = 0.023822853341698647
Validation loss = 0.022789524868130684
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02387227676808834
Validation loss = 0.021723413839936256
Validation loss = 0.023371800780296326
Validation loss = 0.022358708083629608
Validation loss = 0.02291438914835453
Validation loss = 0.023757189512252808
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061224489795918366
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06117247238742566
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06112054329371817
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061068702290076333
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061016949152542375
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06096528365791702
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06091370558375635
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.060862214708368556
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.060810810810810814
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.060759493670886074
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06070826306913996
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.060657118786857624
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06060606060606061
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.060555088309503784
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06050420168067227
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.060453400503778336
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06040268456375839
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06035205364626991
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06030150753768844
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0602510460251046
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06020066889632107
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06015037593984962
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06010016694490818
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.060050041701417846
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00104 |
| Iteration     | 46       |
| MaximumReturn | -0.00073 |
| MinimumReturn | -0.00132 |
| TotalSamples  | 79968    |
----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023430168628692627
Validation loss = 0.023591334000229836
Validation loss = 0.024387173354625702
Validation loss = 0.022121326997876167
Validation loss = 0.023936677724123
Validation loss = 0.025906648486852646
Validation loss = 0.023533029481768608
Validation loss = 0.024117566645145416
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023503299802541733
Validation loss = 0.024463608860969543
Validation loss = 0.021432023495435715
Validation loss = 0.022533053532242775
Validation loss = 0.020841825753450394
Validation loss = 0.02308497205376625
Validation loss = 0.02481919527053833
Validation loss = 0.023304486647248268
Validation loss = 0.022137129679322243
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02374194748699665
Validation loss = 0.02476547285914421
Validation loss = 0.02382836863398552
Validation loss = 0.023686548694968224
Validation loss = 0.025171050801873207
Validation loss = 0.022994786500930786
Validation loss = 0.024608571082353592
Validation loss = 0.023875907063484192
Validation loss = 0.02736200951039791
Validation loss = 0.02477739192545414
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023474713787436485
Validation loss = 0.0229586623609066
Validation loss = 0.022512774914503098
Validation loss = 0.02281726896762848
Validation loss = 0.023355109617114067
Validation loss = 0.02222771756350994
Validation loss = 0.023705780506134033
Validation loss = 0.022610673680901527
Validation loss = 0.02350379340350628
Validation loss = 0.023782402276992798
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02250825986266136
Validation loss = 0.023184705525636673
Validation loss = 0.022361835464835167
Validation loss = 0.021766414865851402
Validation loss = 0.022575728595256805
Validation loss = 0.023666441440582275
Validation loss = 0.02327163890004158
Validation loss = 0.02167682535946369
Validation loss = 0.02311873249709606
Validation loss = 0.022813830524683
Validation loss = 0.025441721081733704
Validation loss = 0.02206415869295597
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05995004163197336
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.059900166389351084
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.059850374064837904
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.059800664451827246
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05975103734439834
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05970149253731343
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05965202982601491
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.059602649006622516
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05955334987593052
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05950413223140496
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05945499587118084
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0594059405940594
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05935696619950536
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05930807248764415
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05925925925925926
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05921052631578947
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05916187345932621
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.059113300492610835
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05906480721903199
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05901639344262295
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05896805896805897
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.058919803600654665
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.058871627146361405
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.058823529411764705
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05877551020408163
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00111  |
| Iteration     | 47        |
| MaximumReturn | -0.000831 |
| MinimumReturn | -0.00167  |
| TotalSamples  | 81634     |
-----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023975716903805733
Validation loss = 0.024560928344726562
Validation loss = 0.024058973416686058
Validation loss = 0.02593134343624115
Validation loss = 0.023062165826559067
Validation loss = 0.023408520966768265
Validation loss = 0.02398616261780262
Validation loss = 0.02575240470468998
Validation loss = 0.023078233003616333
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022641684859991074
Validation loss = 0.022345494478940964
Validation loss = 0.022527778521180153
Validation loss = 0.02324632927775383
Validation loss = 0.022573648020625114
Validation loss = 0.0223250575363636
Validation loss = 0.022444959729909897
Validation loss = 0.02246437408030033
Validation loss = 0.024108314886689186
Validation loss = 0.02301739901304245
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02473493479192257
Validation loss = 0.026342030614614487
Validation loss = 0.02545655146241188
Validation loss = 0.026097256690263748
Validation loss = 0.023024728521704674
Validation loss = 0.025212008506059647
Validation loss = 0.025985825806856155
Validation loss = 0.026030147448182106
Validation loss = 0.02790592610836029
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025814026594161987
Validation loss = 0.02207571640610695
Validation loss = 0.0240012314170599
Validation loss = 0.02202441915869713
Validation loss = 0.02410488948225975
Validation loss = 0.022874193266034126
Validation loss = 0.022382738068699837
Validation loss = 0.023652803152799606
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022939404472708702
Validation loss = 0.022413551807403564
Validation loss = 0.024902522563934326
Validation loss = 0.02406996116042137
Validation loss = 0.02192840538918972
Validation loss = 0.02213433012366295
Validation loss = 0.023580318316817284
Validation loss = 0.022764593362808228
Validation loss = 0.02388342097401619
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05872756933115824
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05867970660146699
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05863192182410423
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05858421480878763
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05853658536585366
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05848903330625508
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05844155844155844
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.058394160583941604
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05834683954619125
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.058299595141700404
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05825242718446602
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0582053354890865
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05815831987075929
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05811138014527845
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05806451612903226
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05801772763900081
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057971014492753624
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057924376508447305
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05787781350482315
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05783132530120482
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05778491171749599
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057738572574178026
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057692307692307696
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057646116893514815
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0576
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00102  |
| Iteration     | 48        |
| MaximumReturn | -0.000695 |
| MinimumReturn | -0.0015   |
| TotalSamples  | 83300     |
-----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024650434032082558
Validation loss = 0.024067847058176994
Validation loss = 0.023912549018859863
Validation loss = 0.023474007844924927
Validation loss = 0.023440761491656303
Validation loss = 0.022896775975823402
Validation loss = 0.0230487622320652
Validation loss = 0.023898012936115265
Validation loss = 0.02358230948448181
Validation loss = 0.024488722905516624
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022205661982297897
Validation loss = 0.02418159507215023
Validation loss = 0.024156929925084114
Validation loss = 0.02199574187397957
Validation loss = 0.02273554541170597
Validation loss = 0.022438615560531616
Validation loss = 0.022597599774599075
Validation loss = 0.02272612787783146
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02506384626030922
Validation loss = 0.0239928737282753
Validation loss = 0.023926254361867905
Validation loss = 0.024541186168789864
Validation loss = 0.025926264002919197
Validation loss = 0.023784207180142403
Validation loss = 0.02396792732179165
Validation loss = 0.023113271221518517
Validation loss = 0.02398216351866722
Validation loss = 0.024661298841238022
Validation loss = 0.02399229258298874
Validation loss = 0.023042533546686172
Validation loss = 0.02628948539495468
Validation loss = 0.023160945624113083
Validation loss = 0.023503968492150307
Validation loss = 0.02522347681224346
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023978425189852715
Validation loss = 0.022918453440070152
Validation loss = 0.022660400718450546
Validation loss = 0.024284038692712784
Validation loss = 0.024584734812378883
Validation loss = 0.023100541904568672
Validation loss = 0.023324541747570038
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023457113653421402
Validation loss = 0.02331528253853321
Validation loss = 0.0229788925498724
Validation loss = 0.02337552048265934
Validation loss = 0.023594778031110764
Validation loss = 0.024855783209204674
Validation loss = 0.023366181179881096
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05755395683453238
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05750798722044728
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05746209098164405
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05741626794258373
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05737051792828685
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05732484076433121
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057279236276849645
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057233704292527825
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057188244638602066
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05714285714285714
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057097541633624106
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05705229793977813
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057007125890736345
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056962025316455694
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05691699604743083
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05687203791469194
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056827150749802685
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056782334384858045
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05673758865248227
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05669291338582677
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05664830841856806
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05660377358490566
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056559308719560095
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0565149136577708
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05647058823529412
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00408 |
| Iteration     | 49       |
| MaximumReturn | -0.00291 |
| MinimumReturn | -0.00589 |
| TotalSamples  | 84966    |
----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02470555156469345
Validation loss = 0.02294747158885002
Validation loss = 0.02413744106888771
Validation loss = 0.02266252227127552
Validation loss = 0.022776661440730095
Validation loss = 0.024559220299124718
Validation loss = 0.0227897260338068
Validation loss = 0.024756019935011864
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023185651749372482
Validation loss = 0.022095467895269394
Validation loss = 0.022149087861180305
Validation loss = 0.022106505930423737
Validation loss = 0.02174854464828968
Validation loss = 0.02225703001022339
Validation loss = 0.022755911573767662
Validation loss = 0.025302084162831306
Validation loss = 0.021479565650224686
Validation loss = 0.022646469995379448
Validation loss = 0.02242610789835453
Validation loss = 0.02341345325112343
Validation loss = 0.021575946360826492
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023753777146339417
Validation loss = 0.022411378100514412
Validation loss = 0.027090206742286682
Validation loss = 0.023698857054114342
Validation loss = 0.02488628402352333
Validation loss = 0.024573706090450287
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022582372650504112
Validation loss = 0.020857375115156174
Validation loss = 0.023052366450428963
Validation loss = 0.022722652181982994
Validation loss = 0.02394784614443779
Validation loss = 0.023478666320443153
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023438554257154465
Validation loss = 0.02187276817858219
Validation loss = 0.022064611315727234
Validation loss = 0.023114945739507675
Validation loss = 0.022142786532640457
Validation loss = 0.022265147417783737
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05642633228840126
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056382145653876274
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056338028169014086
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05629397967161845
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05625
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05620608899297424
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056162246489859596
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05611847233047545
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056074766355140186
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05603112840466926
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05598755832037325
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055944055944055944
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055900621118012424
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055857253685027156
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05581395348837209
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055770720371804805
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05572755417956656
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05568445475638051
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05564142194744977
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055598455598455596
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05555555555555555
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05551272166538165
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05546995377503852
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05542725173210162
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055384615384615386
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00125  |
| Iteration     | 50        |
| MaximumReturn | -0.000838 |
| MinimumReturn | -0.00174  |
| TotalSamples  | 86632     |
-----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022149143740534782
Validation loss = 0.022441327571868896
Validation loss = 0.021945420652627945
Validation loss = 0.02477942779660225
Validation loss = 0.022884422913193703
Validation loss = 0.022877691313624382
Validation loss = 0.023042676970362663
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022138142958283424
Validation loss = 0.022809438407421112
Validation loss = 0.02170599065721035
Validation loss = 0.024668347090482712
Validation loss = 0.0221742894500494
Validation loss = 0.02231532521545887
Validation loss = 0.02285938151180744
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023538021370768547
Validation loss = 0.023242758587002754
Validation loss = 0.023460937663912773
Validation loss = 0.02453727275133133
Validation loss = 0.022984640672802925
Validation loss = 0.024886824190616608
Validation loss = 0.02483244054019451
Validation loss = 0.023501096293330193
Validation loss = 0.023630382493138313
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024739621207118034
Validation loss = 0.02285638451576233
Validation loss = 0.023634325712919235
Validation loss = 0.02276693470776081
Validation loss = 0.023135030642151833
Validation loss = 0.021801311522722244
Validation loss = 0.0227128267288208
Validation loss = 0.02309139259159565
Validation loss = 0.02232055366039276
Validation loss = 0.023110397160053253
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022171907126903534
Validation loss = 0.022822881117463112
Validation loss = 0.022388871759176254
Validation loss = 0.022533362731337547
Validation loss = 0.022394709289073944
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05534204458109147
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055299539170506916
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05525709900230238
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05521472392638037
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05517241379310345
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055130168453292494
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05508798775822494
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05504587155963303
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05500381970970206
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0549618320610687
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05491990846681922
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.054878048780487805
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05483625285605483
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0547945205479452
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05475285171102662
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0547112462006079
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05466970387243736
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.054628224582701064
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05458680818802123
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05454545454545454
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05450416351249054
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05446293494704992
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05442176870748299
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.054380664652567974
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05433962264150943
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00787 |
| Iteration     | 51       |
| MaximumReturn | -0.00561 |
| MinimumReturn | -0.0104  |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02231435663998127
Validation loss = 0.024728648364543915
Validation loss = 0.02303360588848591
Validation loss = 0.022514620795845985
Validation loss = 0.02273241989314556
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024063115939497948
Validation loss = 0.021862974390387535
Validation loss = 0.022194815799593925
Validation loss = 0.021824689581990242
Validation loss = 0.02234237641096115
Validation loss = 0.022644514217972755
Validation loss = 0.02315761335194111
Validation loss = 0.023126846179366112
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02417670004069805
Validation loss = 0.023859452456235886
Validation loss = 0.023859037086367607
Validation loss = 0.022838955745100975
Validation loss = 0.025559259578585625
Validation loss = 0.022915691137313843
Validation loss = 0.02437274344265461
Validation loss = 0.024037452414631844
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02254704013466835
Validation loss = 0.02346104383468628
Validation loss = 0.022274412214756012
Validation loss = 0.02178630232810974
Validation loss = 0.021133003756403923
Validation loss = 0.02349941059947014
Validation loss = 0.02239188738167286
Validation loss = 0.022586865350604057
Validation loss = 0.023027677088975906
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02086823247373104
Validation loss = 0.023201726377010345
Validation loss = 0.021505307406187057
Validation loss = 0.021978335455060005
Validation loss = 0.02151414565742016
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05429864253393665
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05425772418990203
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05421686746987952
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05417607223476298
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05413533834586466
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.054094665664913597
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05405405405405406
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05401350337584396
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053973013493253376
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05393258426966292
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05389221556886228
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05385190725504862
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053811659192825115
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0537714712471994
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05373134328358209
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053691275167785234
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05365126676602087
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05361131794489948
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05357142857142857
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053531598513011154
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05349182763744428
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053452115812917596
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05341246290801187
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05337286879169755
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05333333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0011   |
| Iteration     | 52        |
| MaximumReturn | -0.000762 |
| MinimumReturn | -0.00138  |
| TotalSamples  | 89964     |
-----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022195754572749138
Validation loss = 0.021506521850824356
Validation loss = 0.022306816652417183
Validation loss = 0.022908726707100868
Validation loss = 0.02227705530822277
Validation loss = 0.023459233343601227
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021178344264626503
Validation loss = 0.022602500393986702
Validation loss = 0.020772958174347878
Validation loss = 0.02179591730237007
Validation loss = 0.02213927172124386
Validation loss = 0.022015368565917015
Validation loss = 0.02191922999918461
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023726023733615875
Validation loss = 0.024625007063150406
Validation loss = 0.02250673994421959
Validation loss = 0.02311224862933159
Validation loss = 0.021880103275179863
Validation loss = 0.02309579774737358
Validation loss = 0.024649323895573616
Validation loss = 0.023772498592734337
Validation loss = 0.024880291894078255
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02130744792521
Validation loss = 0.02316426672041416
Validation loss = 0.022351723164319992
Validation loss = 0.022207098081707954
Validation loss = 0.02240530587732792
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02146124094724655
Validation loss = 0.023751867935061455
Validation loss = 0.023371325805783272
Validation loss = 0.022045442834496498
Validation loss = 0.020914210006594658
Validation loss = 0.022136272862553596
Validation loss = 0.025283431634306908
Validation loss = 0.022024575620889664
Validation loss = 0.022743983194231987
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053293856402664694
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05325443786982249
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05321507760532151
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053175775480059084
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053136531365313655
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05309734513274336
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05305821665438467
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053019145802650956
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052980132450331126
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052941176470588235
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05290227773695812
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05286343612334802
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05282465150403522
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05278592375366569
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05274725274725275
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0527086383601757
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05267008046817849
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05263157894736842
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052593133674214754
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052554744525547446
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0525164113785558
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052478134110787174
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05243991260014567
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05240174672489083
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05236363636363636
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00103  |
| Iteration     | 53        |
| MaximumReturn | -0.000857 |
| MinimumReturn | -0.00148  |
| TotalSamples  | 91630     |
-----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0232758317142725
Validation loss = 0.022842036560177803
Validation loss = 0.024572178721427917
Validation loss = 0.021539218723773956
Validation loss = 0.02333008125424385
Validation loss = 0.02289668656885624
Validation loss = 0.021913722157478333
Validation loss = 0.023417659103870392
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021849263459444046
Validation loss = 0.021887093782424927
Validation loss = 0.023464975878596306
Validation loss = 0.02185266651213169
Validation loss = 0.02173398621380329
Validation loss = 0.022665029391646385
Validation loss = 0.022408610209822655
Validation loss = 0.02273900806903839
Validation loss = 0.02202732488512993
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02277335897088051
Validation loss = 0.02396262437105179
Validation loss = 0.023521874099969864
Validation loss = 0.024275433272123337
Validation loss = 0.023890603333711624
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022770117968320847
Validation loss = 0.024147484451532364
Validation loss = 0.022339584305882454
Validation loss = 0.0224614217877388
Validation loss = 0.023804960772395134
Validation loss = 0.023152051493525505
Validation loss = 0.023279944434762
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021763993427157402
Validation loss = 0.021423015743494034
Validation loss = 0.022867906838655472
Validation loss = 0.02236444130539894
Validation loss = 0.025051826611161232
Validation loss = 0.02183837629854679
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05232558139534884
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05228758169934641
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05224963715529753
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05221174764321972
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05217391304347826
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05213613323678494
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05209840810419682
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052060737527114966
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05202312138728324
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05198555956678701
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05194805194805195
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05191059841384282
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05187319884726225
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05183585313174946
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.051798561151079135
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05176132278936017
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05172413793103448
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05168700646087581
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05164992826398852
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05161290322580645
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05157593123209169
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05153901216893343
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05150214592274678
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05146533238027162
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05142857142857143
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00106  |
| Iteration     | 54        |
| MaximumReturn | -0.000761 |
| MinimumReturn | -0.00142  |
| TotalSamples  | 93296     |
-----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022383317351341248
Validation loss = 0.02304432913661003
Validation loss = 0.02333156205713749
Validation loss = 0.022511467337608337
Validation loss = 0.027370989322662354
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02204996533691883
Validation loss = 0.021915443241596222
Validation loss = 0.0229830089956522
Validation loss = 0.022906847298145294
Validation loss = 0.021897731348872185
Validation loss = 0.023193420842289925
Validation loss = 0.022351521998643875
Validation loss = 0.022067241370677948
Validation loss = 0.02401312254369259
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022086184471845627
Validation loss = 0.023786187171936035
Validation loss = 0.024105025455355644
Validation loss = 0.02301650121808052
Validation loss = 0.02286723628640175
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021831559017300606
Validation loss = 0.02297552488744259
Validation loss = 0.022270238026976585
Validation loss = 0.023814037442207336
Validation loss = 0.023942353203892708
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02320869080722332
Validation loss = 0.022199194878339767
Validation loss = 0.02159484662115574
Validation loss = 0.02153157815337181
Validation loss = 0.022174939513206482
Validation loss = 0.02139144204556942
Validation loss = 0.022603536024689674
Validation loss = 0.023684337735176086
Validation loss = 0.021646687760949135
Validation loss = 0.02045106701552868
Validation loss = 0.02281789481639862
Validation loss = 0.02164558507502079
Validation loss = 0.02060241438448429
Validation loss = 0.022672561928629875
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05139186295503212
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05135520684736091
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05131860299358518
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05128205128205128
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05124555160142349
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.051209103840682786
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0511727078891258
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05113636363636364
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0511000709723208
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05106382978723404
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05102763997165131
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05099150141643059
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.050955414012738856
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05091937765205092
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05088339222614841
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05084745762711865
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05081157374735357
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05077574047954866
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0507399577167019
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05070422535211268
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.050668543279380716
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05063291139240506
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.050597329585382995
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05056179775280899
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05052631578947368
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00109  |
| Iteration     | 55        |
| MaximumReturn | -0.000874 |
| MinimumReturn | -0.00141  |
| TotalSamples  | 94962     |
-----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023079263046383858
Validation loss = 0.022809239104390144
Validation loss = 0.02321919985115528
Validation loss = 0.02274099364876747
Validation loss = 0.023496275767683983
Validation loss = 0.022657528519630432
Validation loss = 0.021805735304951668
Validation loss = 0.024137316271662712
Validation loss = 0.02288356050848961
Validation loss = 0.023249460384249687
Validation loss = 0.02234610915184021
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02163180708885193
Validation loss = 0.02134663797914982
Validation loss = 0.022071411833167076
Validation loss = 0.023567531257867813
Validation loss = 0.02141534723341465
Validation loss = 0.023039937019348145
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023711465299129486
Validation loss = 0.022672077640891075
Validation loss = 0.022947242483496666
Validation loss = 0.0234676506370306
Validation loss = 0.02218995988368988
Validation loss = 0.02314375899732113
Validation loss = 0.023409409448504448
Validation loss = 0.022735705599188805
Validation loss = 0.023199405521154404
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023503359407186508
Validation loss = 0.022292951121926308
Validation loss = 0.023396126925945282
Validation loss = 0.02291363663971424
Validation loss = 0.024722808972001076
Validation loss = 0.022027764469385147
Validation loss = 0.022386515513062477
Validation loss = 0.021938569843769073
Validation loss = 0.02254963107407093
Validation loss = 0.021604342386126518
Validation loss = 0.022413652390241623
Validation loss = 0.023931805044412613
Validation loss = 0.02196701616048813
Validation loss = 0.021929515525698662
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022689105942845345
Validation loss = 0.02110082469880581
Validation loss = 0.02104373648762703
Validation loss = 0.02097426727414131
Validation loss = 0.021516524255275726
Validation loss = 0.024153461679816246
Validation loss = 0.020994530990719795
Validation loss = 0.021770622581243515
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05049088359046283
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.050455501051156273
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05042016806722689
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05038488453463961
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05034965034965035
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.050314465408805034
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05027932960893855
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05024424284717376
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0502092050209205
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.050174216027874564
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05013927576601671
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05010438413361169
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05006954102920723
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05003474635163308
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04996530187369882
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.049930651872399444
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0498960498960499
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04986149584487535
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04982698961937716
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04979253112033195
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0497581202487906
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.049723756906077346
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.049689440993788817
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0496551724137931
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000986 |
| Iteration     | 56        |
| MaximumReturn | -0.000716 |
| MinimumReturn | -0.00136  |
| TotalSamples  | 96628     |
-----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02233661152422428
Validation loss = 0.022679263725876808
Validation loss = 0.02243218757212162
Validation loss = 0.024238502606749535
Validation loss = 0.022209467366337776
Validation loss = 0.022996671497821808
Validation loss = 0.022238431498408318
Validation loss = 0.02269425243139267
Validation loss = 0.02300199307501316
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02325890026986599
Validation loss = 0.021859237924218178
Validation loss = 0.024177521467208862
Validation loss = 0.02153203822672367
Validation loss = 0.02117299847304821
Validation loss = 0.02402939461171627
Validation loss = 0.0249054953455925
Validation loss = 0.0213596373796463
Validation loss = 0.02242322824895382
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023690253496170044
Validation loss = 0.024947762489318848
Validation loss = 0.02299671433866024
Validation loss = 0.022731157019734383
Validation loss = 0.023059437051415443
Validation loss = 0.025401892140507698
Validation loss = 0.022159777581691742
Validation loss = 0.023006191477179527
Validation loss = 0.023332267999649048
Validation loss = 0.0232074111700058
Validation loss = 0.022527145221829414
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022887786850333214
Validation loss = 0.02268681675195694
Validation loss = 0.022195421159267426
Validation loss = 0.022549694404006004
Validation loss = 0.022225739434361458
Validation loss = 0.022505363449454308
Validation loss = 0.02277381531894207
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023305492475628853
Validation loss = 0.022344091907143593
Validation loss = 0.020850278437137604
Validation loss = 0.020817818120121956
Validation loss = 0.021333778277039528
Validation loss = 0.02472318522632122
Validation loss = 0.021420879289507866
Validation loss = 0.02284896932542324
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04962095106822881
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.049586776859504134
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04955264969029594
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04951856946354883
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.049484536082474224
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04945054945054945
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04941660947151681
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04938271604938271
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04934886908841672
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.049315068493150684
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.049281314168377825
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.049247606019151846
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04921394395078606
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04918032786885246
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.049146757679180884
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04911323328785812
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.049079754601226995
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04904632152588556
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04901293396868618
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04897959183673469
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04894629503738953
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04891304347826087
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.048879837067209775
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.048846675712347354
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0488135593220339
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00201 |
| Iteration     | 57       |
| MaximumReturn | -0.00104 |
| MinimumReturn | -0.00278 |
| TotalSamples  | 98294    |
----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02468673139810562
Validation loss = 0.022211750969290733
Validation loss = 0.02192239835858345
Validation loss = 0.023075252771377563
Validation loss = 0.025125693529844284
Validation loss = 0.021290786564350128
Validation loss = 0.022026726976037025
Validation loss = 0.022069856524467468
Validation loss = 0.02192435972392559
Validation loss = 0.02649092860519886
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022739775478839874
Validation loss = 0.021759921684861183
Validation loss = 0.02199532464146614
Validation loss = 0.02207455039024353
Validation loss = 0.02297099493443966
Validation loss = 0.0200840774923563
Validation loss = 0.021068911999464035
Validation loss = 0.02084917202591896
Validation loss = 0.021311186254024506
Validation loss = 0.02197462134063244
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02381161041557789
Validation loss = 0.022841673344373703
Validation loss = 0.024025335907936096
Validation loss = 0.023149807006120682
Validation loss = 0.023304719477891922
Validation loss = 0.02220366895198822
Validation loss = 0.022786520421504974
Validation loss = 0.023432614281773567
Validation loss = 0.024757280945777893
Validation loss = 0.023805472999811172
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023923100903630257
Validation loss = 0.022938434034585953
Validation loss = 0.02271432802081108
Validation loss = 0.022198041900992393
Validation loss = 0.02216663956642151
Validation loss = 0.026825513690710068
Validation loss = 0.022413231432437897
Validation loss = 0.02181336283683777
Validation loss = 0.023354019969701767
Validation loss = 0.021985840052366257
Validation loss = 0.0228513665497303
Validation loss = 0.02201754041016102
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021891476586461067
Validation loss = 0.021190578117966652
Validation loss = 0.022214801982045174
Validation loss = 0.022227367386221886
Validation loss = 0.022317735478281975
Validation loss = 0.021797452121973038
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04878048780487805
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04874746106973595
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04871447902571042
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0486815415821501
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04864864864864865
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.048615800135043886
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.048582995951417005
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.048550236008091704
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04851752021563342
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.048484848484848485
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04845222072678331
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0484196368527236
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04838709677419355
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.048354600402955
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04832214765100671
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0482897384305835
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04825737265415549
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04822505023442733
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04819277108433735
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.048160535117056855
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0481283422459893
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04809619238476954
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04806408544726302
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04803202134756504
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.048
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00163  |
| Iteration     | 58        |
| MaximumReturn | -0.000875 |
| MinimumReturn | -0.00227  |
| TotalSamples  | 99960     |
-----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02452028915286064
Validation loss = 0.021969439461827278
Validation loss = 0.021922018378973007
Validation loss = 0.021835898980498314
Validation loss = 0.023335842415690422
Validation loss = 0.02238408476114273
Validation loss = 0.021239101886749268
Validation loss = 0.022164449095726013
Validation loss = 0.022222936153411865
Validation loss = 0.022950273007154465
Validation loss = 0.023156292736530304
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02170700393617153
Validation loss = 0.02068602479994297
Validation loss = 0.02196579985320568
Validation loss = 0.021188614889979362
Validation loss = 0.02364920824766159
Validation loss = 0.02189476042985916
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02298176847398281
Validation loss = 0.023315083235502243
Validation loss = 0.021874478086829185
Validation loss = 0.022189294919371605
Validation loss = 0.02206510491669178
Validation loss = 0.02416263520717621
Validation loss = 0.023110151290893555
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021801292896270752
Validation loss = 0.02221379429101944
Validation loss = 0.0235788282006979
Validation loss = 0.022408289834856987
Validation loss = 0.022409850731492043
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020310893654823303
Validation loss = 0.02229871042072773
Validation loss = 0.02205609530210495
Validation loss = 0.02113812416791916
Validation loss = 0.021004823967814445
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.047968021319120584
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.047936085219707054
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04790419161676647
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.047872340425531915
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04784053156146179
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04780876494023904
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.047777040477770406
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04774535809018567
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04771371769383698
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04768211920529802
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04765056254136334
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.047619047619047616
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04758757435558493
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.047556142668428
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.047524752475247525
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.047493403693931395
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.047462096242584045
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04743083003952569
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04739960500329164
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04736842105263158
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.047337278106508875
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04730617608409987
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04727511490479317
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.047244094488188976
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04721311475409836
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00104  |
| Iteration     | 59        |
| MaximumReturn | -0.000728 |
| MinimumReturn | -0.0014   |
| TotalSamples  | 101626    |
-----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023055223748087883
Validation loss = 0.022023212164640427
Validation loss = 0.02324068732559681
Validation loss = 0.02350766584277153
Validation loss = 0.022363873198628426
Validation loss = 0.022964129224419594
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021812748163938522
Validation loss = 0.021409492939710617
Validation loss = 0.021244201809167862
Validation loss = 0.023268168792128563
Validation loss = 0.022174205631017685
Validation loss = 0.021129943430423737
Validation loss = 0.02342374250292778
Validation loss = 0.02241489104926586
Validation loss = 0.021832959726452827
Validation loss = 0.02144685946404934
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023555507883429527
Validation loss = 0.02404171973466873
Validation loss = 0.022591421380639076
Validation loss = 0.02384425327181816
Validation loss = 0.02320370450615883
Validation loss = 0.023063700646162033
Validation loss = 0.02244340442121029
Validation loss = 0.024049151688814163
Validation loss = 0.022412722930312157
Validation loss = 0.022851796820759773
Validation loss = 0.024408794939517975
Validation loss = 0.0219151321798563
Validation loss = 0.024060942232608795
Validation loss = 0.0237995907664299
Validation loss = 0.02330402284860611
Validation loss = 0.023245854303240776
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024311363697052002
Validation loss = 0.02105514332652092
Validation loss = 0.023071235045790672
Validation loss = 0.0230022594332695
Validation loss = 0.021630609408020973
Validation loss = 0.023099076002836227
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021819666028022766
Validation loss = 0.021207846701145172
Validation loss = 0.022979898378252983
Validation loss = 0.02142144739627838
Validation loss = 0.021396389231085777
Validation loss = 0.021140987053513527
Validation loss = 0.021289873868227005
Validation loss = 0.022030455991625786
Validation loss = 0.021816127002239227
Validation loss = 0.021081041544675827
Validation loss = 0.021915936842560768
Validation loss = 0.025419006124138832
Validation loss = 0.022741787135601044
Validation loss = 0.020948035642504692
Validation loss = 0.021341735497117043
Validation loss = 0.021478040143847466
Validation loss = 0.021337080746889114
Validation loss = 0.021830271929502487
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.047182175622542594
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.047151277013752456
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04712041884816754
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04708960104643558
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.047058823529411764
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.047028086218158065
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04699738903394256
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.046966731898238745
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0469361147327249
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.046905537459283386
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.046875
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0468445022771633
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04681404421326398
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04678362573099415
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.046753246753246755
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04672290720311486
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04669260700389105
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.046662346079066754
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.046632124352331605
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04660194174757282
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04657179818887452
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04654169360051713
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.046511627906976744
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.046481601032924466
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04645161290322581
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00106  |
| Iteration     | 60        |
| MaximumReturn | -0.000777 |
| MinimumReturn | -0.00154  |
| TotalSamples  | 103292    |
-----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021420646458864212
Validation loss = 0.0227831844240427
Validation loss = 0.02217152714729309
Validation loss = 0.023335188627243042
Validation loss = 0.022898901253938675
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02210165560245514
Validation loss = 0.02182607725262642
Validation loss = 0.021944666281342506
Validation loss = 0.02231820859014988
Validation loss = 0.025110997259616852
Validation loss = 0.02120530605316162
Validation loss = 0.02263292856514454
Validation loss = 0.02196340635418892
Validation loss = 0.021603353321552277
Validation loss = 0.02198653668165207
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022606298327445984
Validation loss = 0.022771691903471947
Validation loss = 0.022577378898859024
Validation loss = 0.023053938522934914
Validation loss = 0.021695226430892944
Validation loss = 0.022458715364336967
Validation loss = 0.021992063149809837
Validation loss = 0.024048758670687675
Validation loss = 0.02321145497262478
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02206937037408352
Validation loss = 0.023419125005602837
Validation loss = 0.022703884169459343
Validation loss = 0.023004639893770218
Validation loss = 0.02178952470421791
Validation loss = 0.02222703956067562
Validation loss = 0.022079432383179665
Validation loss = 0.024157507345080376
Validation loss = 0.023968065157532692
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020609840750694275
Validation loss = 0.02179623767733574
Validation loss = 0.02098883129656315
Validation loss = 0.021201903000473976
Validation loss = 0.02462773397564888
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04642166344294004
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04639175257731959
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0463618802318094
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04633204633204633
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04630225080385852
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04627249357326478
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.046242774566473986
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04621309370988447
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.046183450930083386
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.046153846153846156
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04612427930813581
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.046094750320102434
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.046065259117082535
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04603580562659847
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04600638977635783
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04597701149425287
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04594767070835992
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04591836734693878
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045889101338432124
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045859872611464965
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04583068109484405
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04580152671755725
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04577240940877304
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045743329097839895
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045714285714285714
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000994 |
| Iteration     | 61        |
| MaximumReturn | -0.000678 |
| MinimumReturn | -0.00151  |
| TotalSamples  | 104958    |
-----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02251521125435829
Validation loss = 0.022705405950546265
Validation loss = 0.02308678813278675
Validation loss = 0.022395577281713486
Validation loss = 0.022402741014957428
Validation loss = 0.023762473836541176
Validation loss = 0.02222028188407421
Validation loss = 0.026122869923710823
Validation loss = 0.022321190685033798
Validation loss = 0.022004643455147743
Validation loss = 0.022735485807061195
Validation loss = 0.021887335926294327
Validation loss = 0.023333339020609856
Validation loss = 0.022443372756242752
Validation loss = 0.022003114223480225
Validation loss = 0.02355559915304184
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02149232290685177
Validation loss = 0.021244514733552933
Validation loss = 0.020495565608143806
Validation loss = 0.020944586023688316
Validation loss = 0.02193669229745865
Validation loss = 0.022823909297585487
Validation loss = 0.021678568795323372
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02307918854057789
Validation loss = 0.023542530834674835
Validation loss = 0.021160941570997238
Validation loss = 0.02177918702363968
Validation loss = 0.021779876202344894
Validation loss = 0.023858457803726196
Validation loss = 0.022320589050650597
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023480573669075966
Validation loss = 0.022263895720243454
Validation loss = 0.02368283085525036
Validation loss = 0.02281801961362362
Validation loss = 0.02274000644683838
Validation loss = 0.02330389991402626
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021158425137400627
Validation loss = 0.020492814481258392
Validation loss = 0.0201871395111084
Validation loss = 0.0233339574187994
Validation loss = 0.02157657966017723
Validation loss = 0.02117554098367691
Validation loss = 0.020896252244710922
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04568527918781726
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045656309448319596
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045627376425855515
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045598480050664976
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04556962025316456
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04554079696394687
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045512010113780026
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045483259633607075
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045454545454545456
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045425867507886436
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04539722572509458
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045368620037807186
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04534005037783375
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045311516677155446
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045283018867924525
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04525455688246386
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04522613065326633
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04519774011299435
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0451693851944793
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045141065830721
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045112781954887216
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045084533500313086
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04505632040050062
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0450281425891182
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00127  |
| Iteration     | 62        |
| MaximumReturn | -0.000778 |
| MinimumReturn | -0.0018   |
| TotalSamples  | 106624    |
-----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02199660614132881
Validation loss = 0.021424327045679092
Validation loss = 0.023133626207709312
Validation loss = 0.022390160709619522
Validation loss = 0.021853646263480186
Validation loss = 0.021700622513890266
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02229846455156803
Validation loss = 0.02124546654522419
Validation loss = 0.024442585185170174
Validation loss = 0.0219587329775095
Validation loss = 0.02132614329457283
Validation loss = 0.02185944840312004
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02194366231560707
Validation loss = 0.02258267253637314
Validation loss = 0.02234790287911892
Validation loss = 0.021945014595985413
Validation loss = 0.023047255352139473
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022050902247428894
Validation loss = 0.02285240776836872
Validation loss = 0.02254863642156124
Validation loss = 0.020737161859869957
Validation loss = 0.02312333881855011
Validation loss = 0.020896676927804947
Validation loss = 0.021908197551965714
Validation loss = 0.02134918048977852
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021357998251914978
Validation loss = 0.023597512394189835
Validation loss = 0.021403564140200615
Validation loss = 0.021979449316859245
Validation loss = 0.021640941500663757
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04497189256714553
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0449438202247191
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.044915782907049284
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04488778054862843
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.044859813084112146
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0448318804483188
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.044803982576229
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04477611940298507
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.044748290863890615
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04472049689440994
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0446927374301676
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04466501240694789
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.044637321760694355
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04460966542750929
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04458204334365325
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04455445544554455
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04452690166975881
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04449938195302843
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.044471896232242125
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.044444444444444446
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04441702652683529
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04438964241676942
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04436229205175601
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04433497536945813
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.044307692307692305
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0011  |
| Iteration     | 63       |
| MaximumReturn | -0.00073 |
| MinimumReturn | -0.00155 |
| TotalSamples  | 108290   |
----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021371258422732353
Validation loss = 0.022200167179107666
Validation loss = 0.023313453420996666
Validation loss = 0.023124055936932564
Validation loss = 0.023133443668484688
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02172691375017166
Validation loss = 0.022280976176261902
Validation loss = 0.02106555551290512
Validation loss = 0.021235985681414604
Validation loss = 0.02112502045929432
Validation loss = 0.021839866414666176
Validation loss = 0.021719396114349365
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02188471518456936
Validation loss = 0.022028284147381783
Validation loss = 0.02179316245019436
Validation loss = 0.022864772006869316
Validation loss = 0.022541118785738945
Validation loss = 0.023450128734111786
Validation loss = 0.021792827174067497
Validation loss = 0.023870646953582764
Validation loss = 0.02163454331457615
Validation loss = 0.023824362084269524
Validation loss = 0.0230022631585598
Validation loss = 0.022690972313284874
Validation loss = 0.021574782207608223
Validation loss = 0.023237766698002815
Validation loss = 0.025032564997673035
Validation loss = 0.02192814275622368
Validation loss = 0.02222881093621254
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022870054468512535
Validation loss = 0.022046200931072235
Validation loss = 0.022852981463074684
Validation loss = 0.02286151424050331
Validation loss = 0.021974919363856316
Validation loss = 0.02243848703801632
Validation loss = 0.02177160233259201
Validation loss = 0.022886062040925026
Validation loss = 0.022076131775975227
Validation loss = 0.02293730527162552
Validation loss = 0.021842578426003456
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020847048610448837
Validation loss = 0.02181004360318184
Validation loss = 0.021091096103191376
Validation loss = 0.021499577909708023
Validation loss = 0.02068392001092434
Validation loss = 0.020315181463956833
Validation loss = 0.02200835570693016
Validation loss = 0.021251311525702477
Validation loss = 0.022374717518687248
Validation loss = 0.022025391459465027
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04428044280442804
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.044253226797787336
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.044226044226044224
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04419889502762431
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.044171779141104296
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04414469650521153
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04411764705882353
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04409063074096754
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.044063647490820076
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.044036697247706424
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.044009779951100246
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04398289554062309
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04395604395604396
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04392922513727883
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04390243902439024
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.043875685557586835
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0438489646772229
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04382227632379793
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.043795620437956206
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04376899696048632
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04374240583232078
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04371584699453552
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.043689320388349516
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.043662825955124315
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04363636363636364
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00118  |
| Iteration     | 64        |
| MaximumReturn | -0.000882 |
| MinimumReturn | -0.00154  |
| TotalSamples  | 109956    |
-----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021977148950099945
Validation loss = 0.021302713081240654
Validation loss = 0.02165699563920498
Validation loss = 0.0219192486256361
Validation loss = 0.021364644169807434
Validation loss = 0.021616391837596893
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021994635462760925
Validation loss = 0.02156549133360386
Validation loss = 0.020911844447255135
Validation loss = 0.02130402997136116
Validation loss = 0.02628602273762226
Validation loss = 0.02160649560391903
Validation loss = 0.020643044263124466
Validation loss = 0.021205054596066475
Validation loss = 0.024176839739084244
Validation loss = 0.021227668970823288
Validation loss = 0.021894071251153946
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02259644865989685
Validation loss = 0.02314331941306591
Validation loss = 0.022697903215885162
Validation loss = 0.021701158955693245
Validation loss = 0.02159244939684868
Validation loss = 0.02233581617474556
Validation loss = 0.02311108075082302
Validation loss = 0.022490132600069046
Validation loss = 0.02138269506394863
Validation loss = 0.0227043516933918
Validation loss = 0.022648178040981293
Validation loss = 0.022129910066723824
Validation loss = 0.021873624995350838
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021627068519592285
Validation loss = 0.020861024037003517
Validation loss = 0.02287503145635128
Validation loss = 0.021934209391474724
Validation loss = 0.02185862325131893
Validation loss = 0.02144838497042656
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022411007434129715
Validation loss = 0.02054816670715809
Validation loss = 0.02092142589390278
Validation loss = 0.021620038896799088
Validation loss = 0.021421747282147408
Validation loss = 0.020895754918456078
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.043609933373712904
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.043583535108958835
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.043557168784029036
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04353083434099154
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04350453172205438
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.043478260869565216
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04345202172601086
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04342581423401689
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0433996383363472
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.043373493975903614
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04334738109572547
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04332129963898917
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04329524954900782
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04326923076923077
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.043243243243243246
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04321728691476591
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04319136172765447
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04316546762589928
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04313960455362492
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04311377245508982
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04308797127468582
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0430622009569378
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04303646144650329
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.043010752688172046
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04298507462686567
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00118  |
| Iteration     | 65        |
| MaximumReturn | -0.000928 |
| MinimumReturn | -0.00215  |
| TotalSamples  | 111622    |
-----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022053085267543793
Validation loss = 0.022201409563422203
Validation loss = 0.023079143837094307
Validation loss = 0.02156512252986431
Validation loss = 0.021734129637479782
Validation loss = 0.02449903078377247
Validation loss = 0.02358626015484333
Validation loss = 0.02253708988428116
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02299884706735611
Validation loss = 0.022299285978078842
Validation loss = 0.02134818583726883
Validation loss = 0.0224019605666399
Validation loss = 0.022197682410478592
Validation loss = 0.021790459752082825
Validation loss = 0.022693196311593056
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0219270046800375
Validation loss = 0.02212190069258213
Validation loss = 0.024842506274580956
Validation loss = 0.022732960060238838
Validation loss = 0.02555449679493904
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021408280357718468
Validation loss = 0.02066013030707836
Validation loss = 0.021678393706679344
Validation loss = 0.02269505150616169
Validation loss = 0.02160036563873291
Validation loss = 0.02148761972784996
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021190224215388298
Validation loss = 0.02136426977813244
Validation loss = 0.02014790289103985
Validation loss = 0.02094290405511856
Validation loss = 0.02303149551153183
Validation loss = 0.02187400683760643
Validation loss = 0.02097775787115097
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04295942720763723
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04293381037567084
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04290822407628129
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04288266825491364
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04285714285714286
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04283164782867341
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04280618311533888
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0427807486631016
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04275534441805225
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.042729970326409496
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.042704626334519574
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04267931238885596
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04265402843601896
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04262877442273535
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04260355029585799
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04257835600236547
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0425531914893617
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.042528056704075605
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04250295159386069
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04247787610619469
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04245283018867924
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04242781378903948
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04240282685512368
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04237786933490288
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.042352941176470586
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00113 |
| Iteration     | 66       |
| MaximumReturn | -0.00085 |
| MinimumReturn | -0.00149 |
| TotalSamples  | 113288   |
----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022743284702301025
Validation loss = 0.022610733285546303
Validation loss = 0.022923918440937996
Validation loss = 0.023581326007843018
Validation loss = 0.022728655487298965
Validation loss = 0.02201726660132408
Validation loss = 0.02227899432182312
Validation loss = 0.021765414625406265
Validation loss = 0.0220028068870306
Validation loss = 0.022861700505018234
Validation loss = 0.0240121278911829
Validation loss = 0.02173515222966671
Validation loss = 0.022430626675486565
Validation loss = 0.022686343640089035
Validation loss = 0.023402197286486626
Validation loss = 0.023245900869369507
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022279659286141396
Validation loss = 0.022633612155914307
Validation loss = 0.022512836381793022
Validation loss = 0.02284209616482258
Validation loss = 0.02096615359187126
Validation loss = 0.02156561240553856
Validation loss = 0.022440088912844658
Validation loss = 0.022948162630200386
Validation loss = 0.022028734907507896
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022989505901932716
Validation loss = 0.0232942346483469
Validation loss = 0.023368170484900475
Validation loss = 0.021174145862460136
Validation loss = 0.02286805398762226
Validation loss = 0.020898113027215004
Validation loss = 0.024606356397271156
Validation loss = 0.0231019277125597
Validation loss = 0.02271088771522045
Validation loss = 0.023855341598391533
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02286926470696926
Validation loss = 0.023752901703119278
Validation loss = 0.02280651219189167
Validation loss = 0.02217121236026287
Validation loss = 0.021899018436670303
Validation loss = 0.023160066455602646
Validation loss = 0.023042265325784683
Validation loss = 0.023382043465971947
Validation loss = 0.021431194618344307
Validation loss = 0.02351982332766056
Validation loss = 0.022837281227111816
Validation loss = 0.02157224528491497
Validation loss = 0.022251678630709648
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022343233227729797
Validation loss = 0.022428857162594795
Validation loss = 0.021141480654478073
Validation loss = 0.02233240194618702
Validation loss = 0.02078065648674965
Validation loss = 0.021486755460500717
Validation loss = 0.021236276254057884
Validation loss = 0.02084961347281933
Validation loss = 0.02199731208384037
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.042328042328042326
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04230317273795535
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.042278332354668234
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04225352112676056
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04222873900293255
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04220398593200469
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0421792618629174
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04215456674473068
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.042129900526623756
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.042105263157894736
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04208065458796026
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04205607476635514
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04203152364273205
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.042007001166861145
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04198250728862974
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04195804195804196
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.041933605125218404
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04190919674039581
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.041884816753926704
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04186046511627907
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.041836141778036025
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.041811846689895474
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04178757980266976
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04176334106728538
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04173913043478261
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00111  |
| Iteration     | 67        |
| MaximumReturn | -0.000774 |
| MinimumReturn | -0.00153  |
| TotalSamples  | 114954    |
-----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022257061675190926
Validation loss = 0.022765442728996277
Validation loss = 0.021486014127731323
Validation loss = 0.022459620609879494
Validation loss = 0.02196904458105564
Validation loss = 0.02240375429391861
Validation loss = 0.02156158536672592
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02290245331823826
Validation loss = 0.02202652208507061
Validation loss = 0.0226368959993124
Validation loss = 0.02140117809176445
Validation loss = 0.021521640941500664
Validation loss = 0.022437626495957375
Validation loss = 0.02169189602136612
Validation loss = 0.02219383418560028
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023076394572854042
Validation loss = 0.023202838376164436
Validation loss = 0.021025771275162697
Validation loss = 0.023710789158940315
Validation loss = 0.022197125479578972
Validation loss = 0.02175077050924301
Validation loss = 0.022974412888288498
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02195003442466259
Validation loss = 0.0228824932128191
Validation loss = 0.02150091528892517
Validation loss = 0.0223196130245924
Validation loss = 0.02205204777419567
Validation loss = 0.022170957177877426
Validation loss = 0.022093728184700012
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021472815424203873
Validation loss = 0.02130659855902195
Validation loss = 0.021897122263908386
Validation loss = 0.022612499073147774
Validation loss = 0.02110731229186058
Validation loss = 0.020233018323779106
Validation loss = 0.022312594577670097
Validation loss = 0.021377913653850555
Validation loss = 0.020891400054097176
Validation loss = 0.02075895294547081
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04171494785631518
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04169079328314997
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.041666666666666664
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04164256795835743
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04161849710982659
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0415944540727903
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04157043879907621
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0415464512406232
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04152249134948097
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0414985590778098
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.041474654377880185
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04145077720207254
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04142692750287687
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.041403105232892465
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.041379310344827586
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04135554279149914
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04133180252583238
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04130808950086059
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04128440366972477
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04126074498567335
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.041237113402061855
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0412135088723526
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.041189931350114416
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0411663807890223
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04114285714285714
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0102  |
| Iteration     | 68       |
| MaximumReturn | -0.00587 |
| MinimumReturn | -0.0142  |
| TotalSamples  | 116620   |
----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021453535184264183
Validation loss = 0.023776067420840263
Validation loss = 0.020962508395314217
Validation loss = 0.021968480199575424
Validation loss = 0.022530842572450638
Validation loss = 0.022790992632508278
Validation loss = 0.022114573046565056
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021536601707339287
Validation loss = 0.021373389288783073
Validation loss = 0.021423911675810814
Validation loss = 0.021762410178780556
Validation loss = 0.0219154953956604
Validation loss = 0.021165236830711365
Validation loss = 0.02210666425526142
Validation loss = 0.02297731302678585
Validation loss = 0.021130455657839775
Validation loss = 0.020597675815224648
Validation loss = 0.02327197790145874
Validation loss = 0.021266508847475052
Validation loss = 0.022022489458322525
Validation loss = 0.02155284769833088
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022698499262332916
Validation loss = 0.022233404219150543
Validation loss = 0.02311900444328785
Validation loss = 0.02214028500020504
Validation loss = 0.021693989634513855
Validation loss = 0.02131478115916252
Validation loss = 0.022229237481951714
Validation loss = 0.022180013358592987
Validation loss = 0.021811462938785553
Validation loss = 0.021021287888288498
Validation loss = 0.02252647653222084
Validation loss = 0.02198079787194729
Validation loss = 0.022501254454255104
Validation loss = 0.022108370438218117
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021660957485437393
Validation loss = 0.022754408419132233
Validation loss = 0.022592950612306595
Validation loss = 0.022432679310441017
Validation loss = 0.020668093115091324
Validation loss = 0.02175747975707054
Validation loss = 0.02270604856312275
Validation loss = 0.022216815501451492
Validation loss = 0.022394154220819473
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02055245079100132
Validation loss = 0.02066715620458126
Validation loss = 0.024028832092881203
Validation loss = 0.020733634009957314
Validation loss = 0.022671815007925034
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.041119360365505425
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0410958904109589
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04107244723331432
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04104903078677309
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.041025641025641026
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04100227790432802
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.040978941377347755
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.040955631399317405
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04093234792495736
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04090909090909091
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04088586030664395
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04086265607264472
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04083947816222348
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04081632653061224
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.040793201133144476
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04077010192525481
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04074702886247878
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04072398190045249
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04070096099491238
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04067796610169491
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.040654997176736304
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.040632054176072234
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04060913705583756
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.040586245772266064
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04056338028169014
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00105  |
| Iteration     | 69        |
| MaximumReturn | -0.000823 |
| MinimumReturn | -0.00139  |
| TotalSamples  | 118286    |
-----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021931270137429237
Validation loss = 0.021505286917090416
Validation loss = 0.02243395522236824
Validation loss = 0.02114926092326641
Validation loss = 0.021812360733747482
Validation loss = 0.02246866375207901
Validation loss = 0.021553367376327515
Validation loss = 0.02209479920566082
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021840224042534828
Validation loss = 0.0212368443608284
Validation loss = 0.02206871099770069
Validation loss = 0.0213785320520401
Validation loss = 0.021166881546378136
Validation loss = 0.022600866854190826
Validation loss = 0.022221559658646584
Validation loss = 0.021810147911310196
Validation loss = 0.02107183076441288
Validation loss = 0.02441653236746788
Validation loss = 0.02109019085764885
Validation loss = 0.021146386861801147
Validation loss = 0.021411733701825142
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023068184033036232
Validation loss = 0.02198738045990467
Validation loss = 0.02135002799332142
Validation loss = 0.023425502702593803
Validation loss = 0.02296949364244938
Validation loss = 0.02231437899172306
Validation loss = 0.020554743707180023
Validation loss = 0.021962830796837807
Validation loss = 0.023311786353588104
Validation loss = 0.02178051322698593
Validation loss = 0.022137098014354706
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02183847688138485
Validation loss = 0.024671511724591255
Validation loss = 0.02173006162047386
Validation loss = 0.021206269040703773
Validation loss = 0.021607132628560066
Validation loss = 0.02167290635406971
Validation loss = 0.02278926968574524
Validation loss = 0.021696407347917557
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02103986032307148
Validation loss = 0.021332604810595512
Validation loss = 0.021236151456832886
Validation loss = 0.021199731156229973
Validation loss = 0.02129100263118744
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04054054054054054
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04051772650534609
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04049493813273341
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04047217537942664
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04044943820224719
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04042672655811342
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04040404040404041
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.040381379697139654
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04035874439461883
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.040336134453781515
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.040313549832026875
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04029099048684947
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.040268456375838924
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04024594745667971
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04022346368715084
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04020100502512563
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04017857142857143
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.040156162855549356
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04013377926421405
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04011142061281337
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0400890868596882
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04006677796327212
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04004449388209121
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.040022234574763754
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00107  |
| Iteration     | 70        |
| MaximumReturn | -0.000772 |
| MinimumReturn | -0.0015   |
| TotalSamples  | 119952    |
-----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021826276555657387
Validation loss = 0.02041538618505001
Validation loss = 0.022819126024842262
Validation loss = 0.021411508321762085
Validation loss = 0.022348683327436447
Validation loss = 0.021802250295877457
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02159832790493965
Validation loss = 0.021532854065299034
Validation loss = 0.020974844694137573
Validation loss = 0.022907409816980362
Validation loss = 0.021387597545981407
Validation loss = 0.021204350516200066
Validation loss = 0.021465379744768143
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020764950662851334
Validation loss = 0.022052574902772903
Validation loss = 0.022913163527846336
Validation loss = 0.0220013540238142
Validation loss = 0.02183353714644909
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022420981898903847
Validation loss = 0.021702680736780167
Validation loss = 0.023267429322004318
Validation loss = 0.02271547168493271
Validation loss = 0.021435966715216637
Validation loss = 0.02201107144355774
Validation loss = 0.02235662192106247
Validation loss = 0.021476643159985542
Validation loss = 0.021714478731155396
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02172524482011795
Validation loss = 0.02134840562939644
Validation loss = 0.02006276324391365
Validation loss = 0.02169906347990036
Validation loss = 0.020029982551932335
Validation loss = 0.0207049623131752
Validation loss = 0.021402765065431595
Validation loss = 0.02066028118133545
Validation loss = 0.02031710185110569
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03997779011660189
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03995560488346282
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03993344425956739
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03991130820399113
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.039889196675900275
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03986710963455149
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03984504703929164
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03982300884955752
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03980099502487562
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.039779005524861875
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03975704030922143
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.039735099337748346
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.039713182570325425
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03969128996692393
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03966942148760331
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.039647577092511016
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03962575674188222
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.039603960396039604
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03958218801539307
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03956043956043956
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.039538714991762765
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03951701427003293
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.039495337356006584
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.039473684210526314
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03945205479452055
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000997 |
| Iteration     | 71        |
| MaximumReturn | -0.000645 |
| MinimumReturn | -0.00129  |
| TotalSamples  | 121618    |
-----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021548448130488396
Validation loss = 0.023795682936906815
Validation loss = 0.020580260083079338
Validation loss = 0.021588796749711037
Validation loss = 0.02130918949842453
Validation loss = 0.022696157917380333
Validation loss = 0.02109053172171116
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02144838310778141
Validation loss = 0.021689398214221
Validation loss = 0.023076314479112625
Validation loss = 0.02152765542268753
Validation loss = 0.021740753203630447
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02246924862265587
Validation loss = 0.02202988974750042
Validation loss = 0.022144872695207596
Validation loss = 0.022404346615076065
Validation loss = 0.022886628285050392
Validation loss = 0.02238042838871479
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02140105329453945
Validation loss = 0.022336650639772415
Validation loss = 0.02252262830734253
Validation loss = 0.02228706330060959
Validation loss = 0.021235797554254532
Validation loss = 0.021971506997942924
Validation loss = 0.021232621744275093
Validation loss = 0.021620633080601692
Validation loss = 0.022143593057990074
Validation loss = 0.022409824654459953
Validation loss = 0.02185603603720665
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021440180018544197
Validation loss = 0.022695448249578476
Validation loss = 0.02080465480685234
Validation loss = 0.021212900057435036
Validation loss = 0.02164355292916298
Validation loss = 0.020366089418530464
Validation loss = 0.020523468032479286
Validation loss = 0.02206537313759327
Validation loss = 0.020681725814938545
Validation loss = 0.021023143082857132
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03943044906900329
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03940886699507389
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03938730853391685
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03936577364680153
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03934426229508197
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03932277444019661
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.039301310043668124
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03927986906710311
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03925845147219193
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03923705722070845
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0392156862745098
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0391943385955362
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03917301414581066
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03915171288743882
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0391304347826087
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03910917979359044
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03908794788273615
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03906673901247965
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.039045553145336226
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03902439024390244
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0390032502708559
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03898213318895506
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03896103896103896
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.038939967550027044
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03891891891891892
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000991 |
| Iteration     | 72        |
| MaximumReturn | -0.000736 |
| MinimumReturn | -0.00149  |
| TotalSamples  | 123284    |
-----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021517999470233917
Validation loss = 0.023035146296024323
Validation loss = 0.021521372720599174
Validation loss = 0.022736813873052597
Validation loss = 0.021703753620386124
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021704964339733124
Validation loss = 0.021679269149899483
Validation loss = 0.022512510418891907
Validation loss = 0.022781215608119965
Validation loss = 0.02113611251115799
Validation loss = 0.02136755920946598
Validation loss = 0.022397834807634354
Validation loss = 0.021815301850438118
Validation loss = 0.021749207749962807
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02174036204814911
Validation loss = 0.022139085456728935
Validation loss = 0.02117469161748886
Validation loss = 0.021828239783644676
Validation loss = 0.022091666236519814
Validation loss = 0.02340754121541977
Validation loss = 0.02292582392692566
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02129971981048584
Validation loss = 0.021808499470353127
Validation loss = 0.022912802174687386
Validation loss = 0.022230692207813263
Validation loss = 0.021665534004569054
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021990099921822548
Validation loss = 0.01996747776865959
Validation loss = 0.020615434274077415
Validation loss = 0.021224884316325188
Validation loss = 0.02126779966056347
Validation loss = 0.022645240649580956
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03889789303079417
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.038876889848812095
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.038855909336211546
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.038834951456310676
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03881401617250674
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03879310344827586
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03877221324717286
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.038751345532831
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.038730500268961805
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03870967741935484
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03868887694787748
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03866809881847476
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03864734299516908
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03862660944206009
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0386058981233244
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03858520900321544
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0385645420460632
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03854389721627409
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.038523274478330656
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.038502673796791446
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03848209513629075
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.038461538461538464
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03844100373731981
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0384204909284952
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0384
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.001   |
| Iteration     | 73       |
| MaximumReturn | -0.00083 |
| MinimumReturn | -0.00127 |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022280337288975716
Validation loss = 0.020740559324622154
Validation loss = 0.021174082532525063
Validation loss = 0.022902153432369232
Validation loss = 0.022744586691260338
Validation loss = 0.021948441863059998
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02142934314906597
Validation loss = 0.021874524652957916
Validation loss = 0.023114638403058052
Validation loss = 0.02148079313337803
Validation loss = 0.021409891545772552
Validation loss = 0.022809084504842758
Validation loss = 0.022047042846679688
Validation loss = 0.020802007988095284
Validation loss = 0.020918242633342743
Validation loss = 0.02110930159687996
Validation loss = 0.02183087170124054
Validation loss = 0.021518202498555183
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02294476516544819
Validation loss = 0.021865740418434143
Validation loss = 0.02181602641940117
Validation loss = 0.022895244881510735
Validation loss = 0.02236192300915718
Validation loss = 0.021643944084644318
Validation loss = 0.02163507416844368
Validation loss = 0.022221561521291733
Validation loss = 0.02259954623878002
Validation loss = 0.02235812321305275
Validation loss = 0.02273906208574772
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021125229075551033
Validation loss = 0.021542755886912346
Validation loss = 0.022167064249515533
Validation loss = 0.021288393065333366
Validation loss = 0.021222500130534172
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020403707399964333
Validation loss = 0.020858069881796837
Validation loss = 0.020771753042936325
Validation loss = 0.021274497732520103
Validation loss = 0.02090652845799923
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03837953091684435
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.038359083644112946
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.038338658146964855
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03831825439063331
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03829787234042553
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03827751196172249
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03825717321997875
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.038236856080722255
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03821656050955414
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03819628647214854
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03817603393425239
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03815580286168521
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.038135593220338986
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03811540497617787
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0380952380952381
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03807509254362771
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03805496828752643
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03803486529318542
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03801478352692714
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03799472295514512
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0379746835443038
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.037954665260938325
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.037934668071654375
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.037914691943127965
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.037894736842105266
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00117 |
| Iteration     | 74       |
| MaximumReturn | -0.0008  |
| MinimumReturn | -0.00159 |
| TotalSamples  | 126616   |
----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021248359233140945
Validation loss = 0.022281786426901817
Validation loss = 0.021423373371362686
Validation loss = 0.021385980769991875
Validation loss = 0.021215051412582397
Validation loss = 0.02314864657819271
Validation loss = 0.02054636739194393
Validation loss = 0.023851890116930008
Validation loss = 0.022369056940078735
Validation loss = 0.022142484784126282
Validation loss = 0.020801398903131485
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022114327177405357
Validation loss = 0.021204732358455658
Validation loss = 0.02121121436357498
Validation loss = 0.022100796923041344
Validation loss = 0.022407494485378265
Validation loss = 0.021544981747865677
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02271737903356552
Validation loss = 0.024768630042672157
Validation loss = 0.02184445410966873
Validation loss = 0.02279754914343357
Validation loss = 0.02207627147436142
Validation loss = 0.023368755355477333
Validation loss = 0.021794578060507774
Validation loss = 0.022540980949997902
Validation loss = 0.02303086221218109
Validation loss = 0.02194276638329029
Validation loss = 0.021678000688552856
Validation loss = 0.02252020500600338
Validation loss = 0.021531498059630394
Validation loss = 0.02467518113553524
Validation loss = 0.02169579081237316
Validation loss = 0.023379797115921974
Validation loss = 0.02190529927611351
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020847667008638382
Validation loss = 0.022363778203725815
Validation loss = 0.022455429658293724
Validation loss = 0.021132631227374077
Validation loss = 0.021502986550331116
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02209979109466076
Validation loss = 0.020345155149698257
Validation loss = 0.0209328755736351
Validation loss = 0.020842481404542923
Validation loss = 0.022267404943704605
Validation loss = 0.01996251940727234
Validation loss = 0.021428516134619713
Validation loss = 0.019976019859313965
Validation loss = 0.021375473588705063
Validation loss = 0.02030499093234539
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03787480273540242
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03785488958990536
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03783499737256963
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.037815126050420166
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03779527559055118
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03777544596012592
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03775563712637651
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03773584905660377
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03771608171817706
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03769633507853403
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03767660910518053
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03765690376569038
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03763721902770518
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03761755485893417
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03759791122715405
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.037578288100208766
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03755868544600939
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03753910323253389
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03751954142782699
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0375
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03748047891723061
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.037460978147762745
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0374414976599064
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.037422037422037424
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0374025974025974
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00116  |
| Iteration     | 75        |
| MaximumReturn | -0.000652 |
| MinimumReturn | -0.00162  |
| TotalSamples  | 128282    |
-----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020659741014242172
Validation loss = 0.02230449952185154
Validation loss = 0.021711744368076324
Validation loss = 0.022595807909965515
Validation loss = 0.022015880793333054
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020704546943306923
Validation loss = 0.022000357508659363
Validation loss = 0.021024681627750397
Validation loss = 0.02195275016129017
Validation loss = 0.021693594753742218
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022952472791075706
Validation loss = 0.022536396980285645
Validation loss = 0.021918555721640587
Validation loss = 0.02166580781340599
Validation loss = 0.021385693922638893
Validation loss = 0.022425591945648193
Validation loss = 0.022098273038864136
Validation loss = 0.020688608288764954
Validation loss = 0.023389242589473724
Validation loss = 0.02111075073480606
Validation loss = 0.021503612399101257
Validation loss = 0.021665852516889572
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02147548645734787
Validation loss = 0.025204064324498177
Validation loss = 0.020959358662366867
Validation loss = 0.021421387791633606
Validation loss = 0.022357691079378128
Validation loss = 0.02075059711933136
Validation loss = 0.022212030366063118
Validation loss = 0.02078339457511902
Validation loss = 0.021208561956882477
Validation loss = 0.020906701683998108
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02022486925125122
Validation loss = 0.021969206631183624
Validation loss = 0.02119646966457367
Validation loss = 0.021234635263681412
Validation loss = 0.020679455250501633
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.037383177570093455
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03736377789309808
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03734439834024896
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03732503888024884
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03730569948186528
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03728638011393061
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.037267080745341616
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03724780134505949
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03722854188210962
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.037209302325581395
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0371900826446281
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0371708828084667
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03715170278637771
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.037132542547705004
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03711340206185567
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03709428129829984
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.037075180226570546
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03705609881626351
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.037037037037037035
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.037017994858611826
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03699897225077081
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03697996918335902
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03696098562628337
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03694202154951257
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.036923076923076927
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.001    |
| Iteration     | 76        |
| MaximumReturn | -0.000743 |
| MinimumReturn | -0.00155  |
| TotalSamples  | 129948    |
-----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02259158156812191
Validation loss = 0.02142958901822567
Validation loss = 0.020738614723086357
Validation loss = 0.021645449101924896
Validation loss = 0.02270643413066864
Validation loss = 0.021551931276917458
Validation loss = 0.0217069573700428
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021094776690006256
Validation loss = 0.022245202213525772
Validation loss = 0.021864330396056175
Validation loss = 0.023950010538101196
Validation loss = 0.02186521887779236
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02147441729903221
Validation loss = 0.021227728575468063
Validation loss = 0.02229510061442852
Validation loss = 0.022399121895432472
Validation loss = 0.02271904982626438
Validation loss = 0.021166136488318443
Validation loss = 0.021735168993473053
Validation loss = 0.02145633101463318
Validation loss = 0.021716002374887466
Validation loss = 0.021513666957616806
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021341679617762566
Validation loss = 0.021585529670119286
Validation loss = 0.020966771990060806
Validation loss = 0.021333511918783188
Validation loss = 0.02242697961628437
Validation loss = 0.02211768366396427
Validation loss = 0.021682491526007652
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02007666602730751
Validation loss = 0.020032551139593124
Validation loss = 0.02026134543120861
Validation loss = 0.02089649997651577
Validation loss = 0.021284503862261772
Validation loss = 0.020183464512228966
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03690415171706817
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.036885245901639344
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03686635944700461
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0368474923234391
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03682864450127877
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03680981595092025
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.036791006642820645
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03677221654749745
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.036753445635528334
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.036734693877551024
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03671596124426313
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03669724770642202
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.036678553234844626
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03665987780040733
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0366412213740458
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03662258392675483
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03660396542958821
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.036585365853658534
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.036566785170137124
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03654822335025381
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0365296803652968
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.036511156186612576
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03649265078560568
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0364741641337386
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03645569620253165
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00101  |
| Iteration     | 77        |
| MaximumReturn | -0.000771 |
| MinimumReturn | -0.00141  |
| TotalSamples  | 131614    |
-----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021924691274762154
Validation loss = 0.022277213633060455
Validation loss = 0.021200306713581085
Validation loss = 0.02161164954304695
Validation loss = 0.021890180185437202
Validation loss = 0.022474152967333794
Validation loss = 0.02136593870818615
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0220340508967638
Validation loss = 0.02236204780638218
Validation loss = 0.021114159375429153
Validation loss = 0.022264033555984497
Validation loss = 0.022123053669929504
Validation loss = 0.022555306553840637
Validation loss = 0.021462734788656235
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02131574973464012
Validation loss = 0.023617051541805267
Validation loss = 0.021957192569971085
Validation loss = 0.021871866658329964
Validation loss = 0.021594438701868057
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02203940786421299
Validation loss = 0.021598195657134056
Validation loss = 0.023452667519450188
Validation loss = 0.022574786096811295
Validation loss = 0.021567530930042267
Validation loss = 0.0218034740537405
Validation loss = 0.022723466157913208
Validation loss = 0.02271268703043461
Validation loss = 0.021535472944378853
Validation loss = 0.021969282999634743
Validation loss = 0.0216218288987875
Validation loss = 0.021531065925955772
Validation loss = 0.024034710600972176
Validation loss = 0.022631894797086716
Validation loss = 0.021124832332134247
Validation loss = 0.022180793806910515
Validation loss = 0.022761259227991104
Validation loss = 0.02117253467440605
Validation loss = 0.021115317940711975
Validation loss = 0.022221343591809273
Validation loss = 0.021938445046544075
Validation loss = 0.022179558873176575
Validation loss = 0.02272900752723217
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02007174864411354
Validation loss = 0.02228688821196556
Validation loss = 0.021108167245984077
Validation loss = 0.019902555271983147
Validation loss = 0.022399412468075752
Validation loss = 0.021252602338790894
Validation loss = 0.020203782245516777
Validation loss = 0.020624667406082153
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03643724696356275
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.036418816388467376
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03640040444893832
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03638201111672562
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03636363636363636
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03634528016153458
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03632694248234107
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.036308623298033284
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.036290322580645164
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.036272040302267
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03625377643504532
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03623553095118269
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03621730382293763
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03619909502262444
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.036180904522613064
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03616273229532898
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03614457831325301
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03612644254892122
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03610832497492478
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03609022556390978
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.036072144288577156
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03605408112168253
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.036036036036036036
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03601800900450225
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.036
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00104  |
| Iteration     | 78        |
| MaximumReturn | -0.000798 |
| MinimumReturn | -0.00133  |
| TotalSamples  | 133280    |
-----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02094876579940319
Validation loss = 0.021749960258603096
Validation loss = 0.022984394803643227
Validation loss = 0.021456047892570496
Validation loss = 0.020557846873998642
Validation loss = 0.02195313759148121
Validation loss = 0.021398775279521942
Validation loss = 0.02197284996509552
Validation loss = 0.02103840932250023
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02213832549750805
Validation loss = 0.022767890244722366
Validation loss = 0.02133633755147457
Validation loss = 0.021466834470629692
Validation loss = 0.02175566740334034
Validation loss = 0.0216482225805521
Validation loss = 0.022559598088264465
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021606571972370148
Validation loss = 0.023897936567664146
Validation loss = 0.022302472963929176
Validation loss = 0.022428840398788452
Validation loss = 0.022027552127838135
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022516073659062386
Validation loss = 0.022499948740005493
Validation loss = 0.022968556731939316
Validation loss = 0.02180889993906021
Validation loss = 0.021573057398200035
Validation loss = 0.023541472852230072
Validation loss = 0.021188488230109215
Validation loss = 0.020540693774819374
Validation loss = 0.022161144763231277
Validation loss = 0.02162264846265316
Validation loss = 0.022653954103589058
Validation loss = 0.02190636470913887
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020826108753681183
Validation loss = 0.024430684745311737
Validation loss = 0.021005714312195778
Validation loss = 0.02053642086684704
Validation loss = 0.022712046280503273
Validation loss = 0.02091803401708603
Validation loss = 0.021376430988311768
Validation loss = 0.022037891671061516
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.035982008995502246
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03596403596403597
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03594608087868198
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03592814371257485
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.035910224438902745
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03589232303090728
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03587443946188341
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.035856573705179286
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.035838725734196115
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03582089552238806
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03580308304326206
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03578528827037773
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03576751117734724
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03574975173783516
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03573200992555831
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03571428571428571
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.035696579077838374
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.035678889990089196
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03566121842496285
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03564356435643564
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.035625927758535375
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03560830860534125
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03559070687098369
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03557312252964427
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.035555555555555556
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00105  |
| Iteration     | 79        |
| MaximumReturn | -0.000753 |
| MinimumReturn | -0.00164  |
| TotalSamples  | 134946    |
-----------------------------
