Logging to experiments/invertedPendulum/nov2/IPO01w350e3_seed2531
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.8352323770523071
Validation loss = 0.7099893093109131
Validation loss = 0.6794323921203613
Validation loss = 0.6517210006713867
Validation loss = 0.6326264142990112
Validation loss = 0.6149477362632751
Validation loss = 0.581207811832428
Validation loss = 0.5898076891899109
Validation loss = 0.5812941789627075
Validation loss = 0.5627806782722473
Validation loss = 0.5498237013816833
Validation loss = 0.5501602292060852
Validation loss = 0.5429388880729675
Validation loss = 0.5281501412391663
Validation loss = 0.5332472324371338
Validation loss = 0.531324565410614
Validation loss = 0.5314766764640808
Validation loss = 0.5120012164115906
Validation loss = 0.5282620787620544
Validation loss = 0.5176258087158203
Validation loss = 0.5053710341453552
Validation loss = 0.5018035769462585
Validation loss = 0.502449631690979
Validation loss = 0.5020684599876404
Validation loss = 0.5015616416931152
Validation loss = 0.5104643702507019
Validation loss = 0.5065901279449463
Validation loss = 0.5064527988433838
Validation loss = 0.5072799921035767
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.839790940284729
Validation loss = 0.7327563762664795
Validation loss = 0.6721954941749573
Validation loss = 0.6571549773216248
Validation loss = 0.6262776255607605
Validation loss = 0.6036093831062317
Validation loss = 0.5892543792724609
Validation loss = 0.5672814846038818
Validation loss = 0.5690567493438721
Validation loss = 0.5611856579780579
Validation loss = 0.5450728535652161
Validation loss = 0.5391286015510559
Validation loss = 0.5303809642791748
Validation loss = 0.5293171405792236
Validation loss = 0.5559982657432556
Validation loss = 0.52403324842453
Validation loss = 0.5295907855033875
Validation loss = 0.5146913528442383
Validation loss = 0.5173017382621765
Validation loss = 0.5244799256324768
Validation loss = 0.5055062770843506
Validation loss = 0.5147191882133484
Validation loss = 0.5073824524879456
Validation loss = 0.5000842809677124
Validation loss = 0.5045166611671448
Validation loss = 0.49626675248146057
Validation loss = 0.5031564235687256
Validation loss = 0.4927828907966614
Validation loss = 0.4948327839374542
Validation loss = 0.47962212562561035
Validation loss = 0.48730579018592834
Validation loss = 0.5249958038330078
Validation loss = 0.5102207660675049
Validation loss = 0.5045230984687805
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.8478103876113892
Validation loss = 0.706967294216156
Validation loss = 0.6618272066116333
Validation loss = 0.6273306012153625
Validation loss = 0.6111351251602173
Validation loss = 0.5972911715507507
Validation loss = 0.5833266973495483
Validation loss = 0.5584144592285156
Validation loss = 0.557981550693512
Validation loss = 0.5459824800491333
Validation loss = 0.5448920726776123
Validation loss = 0.5421172380447388
Validation loss = 0.5324224829673767
Validation loss = 0.5328226089477539
Validation loss = 0.5133230686187744
Validation loss = 0.5121493339538574
Validation loss = 0.5353648066520691
Validation loss = 0.5175141096115112
Validation loss = 0.547379732131958
Validation loss = 0.5205637216567993
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.8407095670700073
Validation loss = 0.7052168846130371
Validation loss = 0.6643528938293457
Validation loss = 0.638400673866272
Validation loss = 0.6151042580604553
Validation loss = 0.6019474267959595
Validation loss = 0.5722415447235107
Validation loss = 0.5659754276275635
Validation loss = 0.5587872862815857
Validation loss = 0.5527999401092529
Validation loss = 0.5438166856765747
Validation loss = 0.5372787117958069
Validation loss = 0.5371120572090149
Validation loss = 0.5413539409637451
Validation loss = 0.5372283458709717
Validation loss = 0.5205593109130859
Validation loss = 0.5268635749816895
Validation loss = 0.5167185664176941
Validation loss = 0.5194643139839172
Validation loss = 0.5061116218566895
Validation loss = 0.5077757239341736
Validation loss = 0.5086113214492798
Validation loss = 0.5003592371940613
Validation loss = 0.4982738196849823
Validation loss = 0.498302161693573
Validation loss = 0.5118635892868042
Validation loss = 0.5112864971160889
Validation loss = 0.4949353337287903
Validation loss = 0.4849129617214203
Validation loss = 0.4888131618499756
Validation loss = 0.4803617596626282
Validation loss = 0.4895411431789398
Validation loss = 0.5077234506607056
Validation loss = 0.4793846309185028
Validation loss = 0.49671611189842224
Validation loss = 0.4966266453266144
Validation loss = 0.4876590371131897
Validation loss = 0.48848316073417664
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.8384278416633606
Validation loss = 0.7072514891624451
Validation loss = 0.6662086844444275
Validation loss = 0.6489165425300598
Validation loss = 0.6239997148513794
Validation loss = 0.6092223525047302
Validation loss = 0.5827839970588684
Validation loss = 0.5866934657096863
Validation loss = 0.57391357421875
Validation loss = 0.5540499091148376
Validation loss = 0.545825719833374
Validation loss = 0.5632504224777222
Validation loss = 0.5444661378860474
Validation loss = 0.5326447486877441
Validation loss = 0.5265927910804749
Validation loss = 0.536952018737793
Validation loss = 0.5143774747848511
Validation loss = 0.507291853427887
Validation loss = 0.5101368427276611
Validation loss = 0.5183932185173035
Validation loss = 0.5207604765892029
Validation loss = 0.4960288107395172
Validation loss = 0.5384052991867065
Validation loss = 0.49878737330436707
Validation loss = 0.5032657980918884
Validation loss = 0.4960847795009613
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.03333333333333333
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03225806451612903
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03125
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.030303030303030304
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.029411764705882353
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02857142857142857
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.027777777777777776
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02702702702702703
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02631578947368421
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02564102564102564
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.025
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.024390243902439025
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.023809523809523808
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.023255813953488372
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022727272727272728
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022222222222222223
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.021739130434782608
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02127659574468085
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.020833333333333332
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02040816326530612
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.38    |
| Iteration     | 0        |
| MaximumReturn | -0.109   |
| MinimumReturn | -48.1    |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5617895722389221
Validation loss = 0.47346195578575134
Validation loss = 0.44872239232063293
Validation loss = 0.4463115930557251
Validation loss = 0.43786314129829407
Validation loss = 0.44207847118377686
Validation loss = 0.4280262887477875
Validation loss = 0.43241432309150696
Validation loss = 0.43178534507751465
Validation loss = 0.4335930347442627
Validation loss = 0.42264124751091003
Validation loss = 0.4320446848869324
Validation loss = 0.4370333254337311
Validation loss = 0.4264111816883087
Validation loss = 0.4541442394256592
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5650295615196228
Validation loss = 0.4725504517555237
Validation loss = 0.4484094977378845
Validation loss = 0.4422323703765869
Validation loss = 0.44053158164024353
Validation loss = 0.44233790040016174
Validation loss = 0.4324660897254944
Validation loss = 0.44036176800727844
Validation loss = 0.441802442073822
Validation loss = 0.4372181296348572
Validation loss = 0.430082768201828
Validation loss = 0.434122234582901
Validation loss = 0.42045679688453674
Validation loss = 0.42603954672813416
Validation loss = 0.42751890420913696
Validation loss = 0.42550358176231384
Validation loss = 0.4160791039466858
Validation loss = 0.4142795205116272
Validation loss = 0.44974231719970703
Validation loss = 0.4258422553539276
Validation loss = 0.43209779262542725
Validation loss = 0.42087021470069885
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4910079538822174
Validation loss = 0.45553380250930786
Validation loss = 0.4426342844963074
Validation loss = 0.4396332800388336
Validation loss = 0.43917950987815857
Validation loss = 0.43699803948402405
Validation loss = 0.43301597237586975
Validation loss = 0.45488330721855164
Validation loss = 0.4367567002773285
Validation loss = 0.4382002353668213
Validation loss = 0.44008010625839233
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6283513307571411
Validation loss = 0.4730074405670166
Validation loss = 0.4496738314628601
Validation loss = 0.44632112979888916
Validation loss = 0.4413667619228363
Validation loss = 0.4336865246295929
Validation loss = 0.43545839190483093
Validation loss = 0.4410831332206726
Validation loss = 0.4352400004863739
Validation loss = 0.4286961853504181
Validation loss = 0.43239453434944153
Validation loss = 0.43449223041534424
Validation loss = 0.4339331090450287
Validation loss = 0.43361082673072815
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.524192214012146
Validation loss = 0.4598064124584198
Validation loss = 0.4424375891685486
Validation loss = 0.4422239363193512
Validation loss = 0.4353411793708801
Validation loss = 0.42837294936180115
Validation loss = 0.43836602568626404
Validation loss = 0.4380950331687927
Validation loss = 0.43238550424575806
Validation loss = 0.4320569932460785
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0196078431372549
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.019230769230769232
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018867924528301886
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018518518518518517
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01818181818181818
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017857142857142856
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017543859649122806
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017241379310344827
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01694915254237288
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016666666666666666
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01639344262295082
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016129032258064516
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015873015873015872
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015625
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015384615384615385
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015151515151515152
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014925373134328358
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014705882352941176
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014492753623188406
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014285714285714285
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014084507042253521
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013888888888888888
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0136986301369863
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013513513513513514
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013333333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.129   |
| Iteration     | 1        |
| MaximumReturn | -0.0478  |
| MinimumReturn | -0.34    |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4608020782470703
Validation loss = 0.41254836320877075
Validation loss = 0.41532397270202637
Validation loss = 0.40809911489486694
Validation loss = 0.4174351096153259
Validation loss = 0.41100963950157166
Validation loss = 0.410805881023407
Validation loss = 0.4069666862487793
Validation loss = 0.41992324590682983
Validation loss = 0.4074980616569519
Validation loss = 0.41796812415122986
Validation loss = 0.41277506947517395
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.46383827924728394
Validation loss = 0.41570961475372314
Validation loss = 0.41425010561943054
Validation loss = 0.42537423968315125
Validation loss = 0.41347238421440125
Validation loss = 0.41018831729888916
Validation loss = 0.41735368967056274
Validation loss = 0.417638897895813
Validation loss = 0.41861456632614136
Validation loss = 0.4083103537559509
Validation loss = 0.412863165140152
Validation loss = 0.4227564334869385
Validation loss = 0.4122297763824463
Validation loss = 0.40867456793785095
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4154287576675415
Validation loss = 0.4120491147041321
Validation loss = 0.4154420793056488
Validation loss = 0.4130398631095886
Validation loss = 0.4147712290287018
Validation loss = 0.40502333641052246
Validation loss = 0.42005857825279236
Validation loss = 0.4118432104587555
Validation loss = 0.40336641669273376
Validation loss = 0.404142290353775
Validation loss = 0.40001851320266724
Validation loss = 0.4086498022079468
Validation loss = 0.40196388959884644
Validation loss = 0.40055763721466064
Validation loss = 0.40712687373161316
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.429612934589386
Validation loss = 0.418058842420578
Validation loss = 0.40564388036727905
Validation loss = 0.4173112213611603
Validation loss = 0.4037483334541321
Validation loss = 0.40823858976364136
Validation loss = 0.4168708324432373
Validation loss = 0.4099552631378174
Validation loss = 0.412956565618515
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4125487804412842
Validation loss = 0.4085872769355774
Validation loss = 0.4052653908729553
Validation loss = 0.4145496189594269
Validation loss = 0.39996013045310974
Validation loss = 0.40246087312698364
Validation loss = 0.40331530570983887
Validation loss = 0.41557949781417847
Validation loss = 0.41113537549972534
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013157894736842105
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012987012987012988
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01282051282051282
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012658227848101266
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0125
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012345679012345678
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012195121951219513
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012048192771084338
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011904761904761904
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011764705882352941
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011627906976744186
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011494252873563218
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011363636363636364
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011235955056179775
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011111111111111112
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01098901098901099
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010869565217391304
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010752688172043012
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010638297872340425
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010526315789473684
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010416666666666666
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010309278350515464
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01020408163265306
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010101010101010102
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0557  |
| Iteration     | 2        |
| MaximumReturn | -0.0276  |
| MinimumReturn | -0.0787  |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.40592920780181885
Validation loss = 0.3892587721347809
Validation loss = 0.3942115008831024
Validation loss = 0.3943331241607666
Validation loss = 0.3930012285709381
Validation loss = 0.3982941806316376
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4241732060909271
Validation loss = 0.3974340260028839
Validation loss = 0.3984506130218506
Validation loss = 0.39886125922203064
Validation loss = 0.4010578393936157
Validation loss = 0.40150871872901917
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4253246784210205
Validation loss = 0.4006192982196808
Validation loss = 0.39135491847991943
Validation loss = 0.3900180160999298
Validation loss = 0.397385835647583
Validation loss = 0.3919339179992676
Validation loss = 0.40295350551605225
Validation loss = 0.39403095841407776
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.41377905011177063
Validation loss = 0.3941402733325958
Validation loss = 0.4009883403778076
Validation loss = 0.39933547377586365
Validation loss = 0.3910796642303467
Validation loss = 0.38949620723724365
Validation loss = 0.3893824517726898
Validation loss = 0.3994852304458618
Validation loss = 0.40321746468544006
Validation loss = 0.3946658670902252
Validation loss = 0.4033111035823822
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4030260145664215
Validation loss = 0.39245226979255676
Validation loss = 0.3912997543811798
Validation loss = 0.39490148425102234
Validation loss = 0.3929078280925751
Validation loss = 0.39024344086647034
Validation loss = 0.39103782176971436
Validation loss = 0.3969856798648834
Validation loss = 0.3961556851863861
Validation loss = 0.3920372426509857
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009900990099009901
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.00980392156862745
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009708737864077669
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009615384615384616
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.009523809523809525
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.018867924528301886
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018691588785046728
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018518518518518517
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01834862385321101
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01818181818181818
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018018018018018018
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017857142857142856
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017699115044247787
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017543859649122806
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017391304347826087
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017241379310344827
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017094017094017096
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01694915254237288
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01680672268907563
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016666666666666666
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01652892561983471
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01639344262295082
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016260162601626018
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016129032258064516
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.356   |
| Iteration     | 3        |
| MaximumReturn | -0.0129  |
| MinimumReturn | -1.78    |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3962487578392029
Validation loss = 0.3829196095466614
Validation loss = 0.38843491673469543
Validation loss = 0.38746777176856995
Validation loss = 0.3827855885028839
Validation loss = 0.3884272575378418
Validation loss = 0.39859718084335327
Validation loss = 0.38879677653312683
Validation loss = 0.3893918991088867
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.39745432138442993
Validation loss = 0.39012858271598816
Validation loss = 0.39145275950431824
Validation loss = 0.39246630668640137
Validation loss = 0.39391401410102844
Validation loss = 0.38826245069503784
Validation loss = 0.38582494854927063
Validation loss = 0.39043790102005005
Validation loss = 0.398720920085907
Validation loss = 0.3899940848350525
Validation loss = 0.40074050426483154
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.40383589267730713
Validation loss = 0.3823586702346802
Validation loss = 0.3851878046989441
Validation loss = 0.39072880148887634
Validation loss = 0.385108083486557
Validation loss = 0.3802553415298462
Validation loss = 0.3860800266265869
Validation loss = 0.39392679929733276
Validation loss = 0.39875584840774536
Validation loss = 0.390578955411911
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3990233838558197
Validation loss = 0.3795754015445709
Validation loss = 0.38689637184143066
Validation loss = 0.3875170648097992
Validation loss = 0.38403385877609253
Validation loss = 0.3854643702507019
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.38780835270881653
Validation loss = 0.3876153826713562
Validation loss = 0.38193315267562866
Validation loss = 0.3900378942489624
Validation loss = 0.38251417875289917
Validation loss = 0.3836253583431244
Validation loss = 0.38437366485595703
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015873015873015872
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015748031496062992
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015625
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015503875968992248
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015384615384615385
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015267175572519083
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015151515151515152
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015037593984962405
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014925373134328358
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014814814814814815
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014705882352941176
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014598540145985401
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014492753623188406
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014388489208633094
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014285714285714285
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014184397163120567
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014084507042253521
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013986013986013986
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013888888888888888
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013793103448275862
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0136986301369863
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013605442176870748
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013513513513513514
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013422818791946308
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013333333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.246   |
| Iteration     | 4        |
| MaximumReturn | -0.0214  |
| MinimumReturn | -2.5     |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3922509253025055
Validation loss = 0.39059484004974365
Validation loss = 0.3926964998245239
Validation loss = 0.3926861882209778
Validation loss = 0.3863520622253418
Validation loss = 0.3875132203102112
Validation loss = 0.3966883420944214
Validation loss = 0.39187854528427124
Validation loss = 0.3946619927883148
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4044250547885895
Validation loss = 0.3935224115848541
Validation loss = 0.3892582058906555
Validation loss = 0.3956848978996277
Validation loss = 0.4004107415676117
Validation loss = 0.39454129338264465
Validation loss = 0.4067925810813904
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.398610919713974
Validation loss = 0.3931674361228943
Validation loss = 0.3884252905845642
Validation loss = 0.39307934045791626
Validation loss = 0.3970502018928528
Validation loss = 0.39604759216308594
Validation loss = 0.39459022879600525
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3954625129699707
Validation loss = 0.3902106285095215
Validation loss = 0.38621193170547485
Validation loss = 0.38232794404029846
Validation loss = 0.3951844573020935
Validation loss = 0.39056330919265747
Validation loss = 0.38720065355300903
Validation loss = 0.3960798382759094
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3904743492603302
Validation loss = 0.388418972492218
Validation loss = 0.378523051738739
Validation loss = 0.3893149495124817
Validation loss = 0.3836166262626648
Validation loss = 0.38297411799430847
Validation loss = 0.38385796546936035
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013245033112582781
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.019736842105263157
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0196078431372549
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01948051948051948
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01935483870967742
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.019230769230769232
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.025477707006369428
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02531645569620253
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.025157232704402517
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.025
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.024844720496894408
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.024691358024691357
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.024539877300613498
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.024390243902439025
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.024242424242424242
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.024096385542168676
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.023952095808383235
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.023809523809523808
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.023668639053254437
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.023529411764705882
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.023391812865497075
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.023255813953488372
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.023121387283236993
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022988505747126436
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022857142857142857
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -46.3    |
| Iteration     | 5        |
| MaximumReturn | -0.917   |
| MinimumReturn | -109     |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3833335041999817
Validation loss = 0.36756888031959534
Validation loss = 0.37777042388916016
Validation loss = 0.3778730034828186
Validation loss = 0.3790397047996521
Validation loss = 0.3892056345939636
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3924255669116974
Validation loss = 0.3764912188053131
Validation loss = 0.37526702880859375
Validation loss = 0.38682278990745544
Validation loss = 0.3746657073497772
Validation loss = 0.37506744265556335
Validation loss = 0.379072368144989
Validation loss = 0.3821546137332916
Validation loss = 0.3878999352455139
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.39009565114974976
Validation loss = 0.37929767370224
Validation loss = 0.3785039484500885
Validation loss = 0.3765667974948883
Validation loss = 0.37622198462486267
Validation loss = 0.3785342872142792
Validation loss = 0.38229072093963623
Validation loss = 0.38791054487228394
Validation loss = 0.3858519494533539
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3919987082481384
Validation loss = 0.37143415212631226
Validation loss = 0.37497758865356445
Validation loss = 0.3772393763065338
Validation loss = 0.3716655373573303
Validation loss = 0.3778257369995117
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3807053565979004
Validation loss = 0.3721855580806732
Validation loss = 0.3682454824447632
Validation loss = 0.3730289041996002
Validation loss = 0.3832206726074219
Validation loss = 0.3723713755607605
Validation loss = 0.3773590922355652
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022727272727272728
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022598870056497175
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02247191011235955
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0223463687150838
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022222222222222223
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022099447513812154
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02197802197802198
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02185792349726776
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.021739130434782608
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.021621621621621623
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.021505376344086023
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0213903743315508
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02127659574468085
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.021164021164021163
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.021052631578947368
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.020942408376963352
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.020833333333333332
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02072538860103627
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.020618556701030927
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.020512820512820513
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02040816326530612
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02030456852791878
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.020202020202020204
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.020100502512562814
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0594  |
| Iteration     | 6        |
| MaximumReturn | -0.0211  |
| MinimumReturn | -0.124   |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.40266475081443787
Validation loss = 0.4042063057422638
Validation loss = 0.39723634719848633
Validation loss = 0.40105319023132324
Validation loss = 0.40319129824638367
Validation loss = 0.40200528502464294
Validation loss = 0.4006819725036621
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.41037318110466003
Validation loss = 0.3958653211593628
Validation loss = 0.4027038812637329
Validation loss = 0.39619314670562744
Validation loss = 0.40303418040275574
Validation loss = 0.40613898634910583
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.40486833453178406
Validation loss = 0.40552225708961487
Validation loss = 0.39815378189086914
Validation loss = 0.4020446240901947
Validation loss = 0.40586307644844055
Validation loss = 0.4023606777191162
Validation loss = 0.4109088182449341
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.40408357977867126
Validation loss = 0.3912869393825531
Validation loss = 0.39395031332969666
Validation loss = 0.3949737548828125
Validation loss = 0.40416499972343445
Validation loss = 0.3970402479171753
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.40925905108451843
Validation loss = 0.39482271671295166
Validation loss = 0.41326090693473816
Validation loss = 0.3934955596923828
Validation loss = 0.3963106870651245
Validation loss = 0.4001607894897461
Validation loss = 0.3922848701477051
Validation loss = 0.402459055185318
Validation loss = 0.3962208926677704
Validation loss = 0.3949071168899536
Validation loss = 0.4101770222187042
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01990049751243781
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.019801980198019802
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.019704433497536946
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0196078431372549
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01951219512195122
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.019417475728155338
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01932367149758454
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.019230769230769232
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.019138755980861243
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01904761904761905
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018957345971563982
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018867924528301886
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018779342723004695
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018691588785046728
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018604651162790697
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018518518518518517
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018433179723502304
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01834862385321101
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0182648401826484
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01818181818181818
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01809954751131222
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018018018018018018
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017937219730941704
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017857142857142856
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017777777777777778
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.104   |
| Iteration     | 7        |
| MaximumReturn | -0.0197  |
| MinimumReturn | -0.735   |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.39868083596229553
Validation loss = 0.3910263478755951
Validation loss = 0.3940090537071228
Validation loss = 0.40614527463912964
Validation loss = 0.3994170129299164
Validation loss = 0.3997073769569397
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4131341576576233
Validation loss = 0.4045003056526184
Validation loss = 0.39969733357429504
Validation loss = 0.39810869097709656
Validation loss = 0.3967130482196808
Validation loss = 0.4029354155063629
Validation loss = 0.4088745415210724
Validation loss = 0.41852065920829773
Validation loss = 0.4054078757762909
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4042913317680359
Validation loss = 0.41754719614982605
Validation loss = 0.40207791328430176
Validation loss = 0.39985087513923645
Validation loss = 0.41655272245407104
Validation loss = 0.4004130959510803
Validation loss = 0.42011556029319763
Validation loss = 0.40798160433769226
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.40020087361335754
Validation loss = 0.4011213481426239
Validation loss = 0.3948783278465271
Validation loss = 0.39283299446105957
Validation loss = 0.3952844738960266
Validation loss = 0.4022827446460724
Validation loss = 0.3921261727809906
Validation loss = 0.39926162362098694
Validation loss = 0.39265871047973633
Validation loss = 0.3981945514678955
Validation loss = 0.39891162514686584
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.39547452330589294
Validation loss = 0.3965908885002136
Validation loss = 0.4154965579509735
Validation loss = 0.3963560461997986
Validation loss = 0.3997381031513214
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017699115044247787
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01762114537444934
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017543859649122806
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017467248908296942
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017391304347826087
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017316017316017316
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017241379310344827
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017167381974248927
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017094017094017096
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01702127659574468
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01694915254237288
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016877637130801686
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01680672268907563
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016736401673640166
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016666666666666666
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016597510373443983
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01652892561983471
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01646090534979424
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01639344262295082
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0163265306122449
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016260162601626018
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016194331983805668
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016129032258064516
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01606425702811245
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.155   |
| Iteration     | 8        |
| MaximumReturn | -0.054   |
| MinimumReturn | -0.323   |
| TotalSamples  | 16660    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.40417051315307617
Validation loss = 0.40112051367759705
Validation loss = 0.40397268533706665
Validation loss = 0.4016653299331665
Validation loss = 0.40207889676094055
Validation loss = 0.40532180666923523
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4103270173072815
Validation loss = 0.4152646064758301
Validation loss = 0.40624719858169556
Validation loss = 0.4076111316680908
Validation loss = 0.4079838991165161
Validation loss = 0.4043768048286438
Validation loss = 0.4163002073764801
Validation loss = 0.41796162724494934
Validation loss = 0.4240303039550781
Validation loss = 0.4176453948020935
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4090556502342224
Validation loss = 0.4110720753669739
Validation loss = 0.4141573905944824
Validation loss = 0.4094531536102295
Validation loss = 0.4112257659435272
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.39920806884765625
Validation loss = 0.39582234621047974
Validation loss = 0.39462947845458984
Validation loss = 0.4087975025177002
Validation loss = 0.4105226397514343
Validation loss = 0.403207391500473
Validation loss = 0.404936283826828
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.39794743061065674
Validation loss = 0.3974291682243347
Validation loss = 0.40588146448135376
Validation loss = 0.4057231843471527
Validation loss = 0.40263184905052185
Validation loss = 0.4058314263820648
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01593625498007968
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015873015873015872
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015810276679841896
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015748031496062992
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01568627450980392
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015625
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01556420233463035
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015503875968992248
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015444015444015444
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015384615384615385
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01532567049808429
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015267175572519083
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015209125475285171
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015151515151515152
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01509433962264151
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015037593984962405
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0149812734082397
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014925373134328358
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01486988847583643
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014814814814814815
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014760147601476014
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014705882352941176
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014652014652014652
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014598540145985401
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014545454545454545
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0357  |
| Iteration     | 9        |
| MaximumReturn | -0.0113  |
| MinimumReturn | -0.083   |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.40810689330101013
Validation loss = 0.4051629602909088
Validation loss = 0.4060138165950775
Validation loss = 0.3998373746871948
Validation loss = 0.4055289328098297
Validation loss = 0.4063359797000885
Validation loss = 0.4053117334842682
Validation loss = 0.41296592354774475
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4202461838722229
Validation loss = 0.4218237102031708
Validation loss = 0.41761165857315063
Validation loss = 0.4131641685962677
Validation loss = 0.4193984866142273
Validation loss = 0.42342668771743774
Validation loss = 0.42819371819496155
Validation loss = 0.4164992868900299
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.41774967312812805
Validation loss = 0.41013699769973755
Validation loss = 0.41394153237342834
Validation loss = 0.4096442461013794
Validation loss = 0.41425153613090515
Validation loss = 0.4090089797973633
Validation loss = 0.4169470965862274
Validation loss = 0.4131750762462616
Validation loss = 0.4202479124069214
Validation loss = 0.4136890172958374
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4004462957382202
Validation loss = 0.4031561613082886
Validation loss = 0.39818352460861206
Validation loss = 0.4039705693721771
Validation loss = 0.40599945187568665
Validation loss = 0.4118424952030182
Validation loss = 0.4106336534023285
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4091376066207886
Validation loss = 0.39795464277267456
Validation loss = 0.39799764752388
Validation loss = 0.398710697889328
Validation loss = 0.40156108140945435
Validation loss = 0.4055233299732208
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014492753623188406
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01444043321299639
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.017985611510791366
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017921146953405017
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017857142857142856
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.021352313167259787
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02127659574468085
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02120141342756184
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02112676056338028
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.021052631578947368
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.024475524475524476
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.024390243902439025
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.024305555555555556
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02422145328719723
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02413793103448276
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.027491408934707903
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0273972602739726
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.027303754266211604
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.027210884353741496
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02711864406779661
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02702702702702703
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.026936026936026935
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.026845637583892617
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.026755852842809364
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02666666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.92    |
| Iteration     | 10       |
| MaximumReturn | -0.0182  |
| MinimumReturn | -40.2    |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.41204819083213806
Validation loss = 0.4109422564506531
Validation loss = 0.4098084568977356
Validation loss = 0.4077678322792053
Validation loss = 0.40920305252075195
Validation loss = 0.41132980585098267
Validation loss = 0.4114212989807129
Validation loss = 0.4119061529636383
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.42415088415145874
Validation loss = 0.4212011694908142
Validation loss = 0.41190212965011597
Validation loss = 0.4216165542602539
Validation loss = 0.42317238450050354
Validation loss = 0.4266696870326996
Validation loss = 0.42746391892433167
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4210189878940582
Validation loss = 0.4211854040622711
Validation loss = 0.4229533076286316
Validation loss = 0.4314550757408142
Validation loss = 0.41947096586227417
Validation loss = 0.4267561435699463
Validation loss = 0.43584269285202026
Validation loss = 0.42219147086143494
Validation loss = 0.42808109521865845
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4046103060245514
Validation loss = 0.4129711985588074
Validation loss = 0.40565210580825806
Validation loss = 0.40923434495925903
Validation loss = 0.4093623161315918
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4078262448310852
Validation loss = 0.40160831809043884
Validation loss = 0.40076708793640137
Validation loss = 0.4054130017757416
Validation loss = 0.4114674925804138
Validation loss = 0.4165550768375397
Validation loss = 0.4137255549430847
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.029900332225913623
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.033112582781456956
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.039603960396039604
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.04276315789473684
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.04918032786885246
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.049019607843137254
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.048859934853420196
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.05844155844155844
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05825242718446602
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.06451612903225806
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06430868167202572
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.07692307692307693
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.07987220447284345
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07961783439490445
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.08253968253968254
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.08544303797468354
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.0914826498422713
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09433962264150944
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09717868338557993
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.103125
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.102803738317757
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.10559006211180125
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.10835913312693499
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1111111111111111
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.11384615384615385
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.39    |
| Iteration     | 11       |
| MaximumReturn | -0.0422  |
| MinimumReturn | -49.7    |
| TotalSamples  | 21658    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4164491593837738
Validation loss = 0.4189339280128479
Validation loss = 0.41335755586624146
Validation loss = 0.41700249910354614
Validation loss = 0.4260496199131012
Validation loss = 0.42527928948402405
Validation loss = 0.4274561405181885
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.43922024965286255
Validation loss = 0.4269244074821472
Validation loss = 0.4278298318386078
Validation loss = 0.42528557777404785
Validation loss = 0.4420435428619385
Validation loss = 0.43610134720802307
Validation loss = 0.44959744811058044
Validation loss = 0.4454765319824219
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.42610374093055725
Validation loss = 0.428779274225235
Validation loss = 0.4376870095729828
Validation loss = 0.43713098764419556
Validation loss = 0.43938079476356506
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.41094309091567993
Validation loss = 0.4140412211418152
Validation loss = 0.41849374771118164
Validation loss = 0.41044172644615173
Validation loss = 0.42481523752212524
Validation loss = 0.4162168502807617
Validation loss = 0.4288608431816101
Validation loss = 0.4218691289424896
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.41360265016555786
Validation loss = 0.4091729521751404
Validation loss = 0.4148896336555481
Validation loss = 0.4178335666656494
Validation loss = 0.41936802864074707
Validation loss = 0.4204831123352051
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1165644171779141
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1162079510703364
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11585365853658537
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11550151975683891
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11515151515151516
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1148036253776435
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1144578313253012
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11411411411411411
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11377245508982035
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11343283582089553
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1130952380952381
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11275964391691394
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11242603550295859
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11209439528023599
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11176470588235295
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11143695014662756
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1111111111111111
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11078717201166181
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11046511627906977
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11014492753623188
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10982658959537572
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10951008645533142
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10919540229885058
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10888252148997135
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10857142857142857
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -39.8    |
| Iteration     | 12       |
| MaximumReturn | -0.042   |
| MinimumReturn | -79.6    |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.42829686403274536
Validation loss = 0.43212729692459106
Validation loss = 0.43194615840911865
Validation loss = 0.43358156085014343
Validation loss = 0.43832919001579285
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.44130733609199524
Validation loss = 0.44401368498802185
Validation loss = 0.44518259167671204
Validation loss = 0.44427645206451416
Validation loss = 0.4526423513889313
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4559062123298645
Validation loss = 0.44345858693122864
Validation loss = 0.44630709290504456
Validation loss = 0.44499072432518005
Validation loss = 0.45540574193000793
Validation loss = 0.44947290420532227
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.42766472697257996
Validation loss = 0.4250829517841339
Validation loss = 0.42547616362571716
Validation loss = 0.4276413917541504
Validation loss = 0.4339543282985687
Validation loss = 0.4319330155849457
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.43152904510498047
Validation loss = 0.42597103118896484
Validation loss = 0.4211147129535675
Validation loss = 0.42233312129974365
Validation loss = 0.42592212557792664
Validation loss = 0.4312678277492523
Validation loss = 0.43830031156539917
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10826210826210826
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10795454545454546
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10764872521246459
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10734463276836158
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10704225352112676
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10674157303370786
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1092436974789916
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10893854748603352
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10863509749303621
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1111111111111111
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11080332409972299
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11049723756906077
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11019283746556474
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10989010989010989
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1095890410958904
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1092896174863388
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.11444141689373297
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.11684782608695653
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.12195121951219512
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12162162162162163
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12129380053908356
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12096774193548387
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12064343163538874
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.12566844919786097
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12533333333333332
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -26.7    |
| Iteration     | 13       |
| MaximumReturn | -0.0515  |
| MinimumReturn | -81.2    |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4618828296661377
Validation loss = 0.44498124718666077
Validation loss = 0.4615384638309479
Validation loss = 0.45449593663215637
Validation loss = 0.45949479937553406
Validation loss = 0.4792025089263916
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4726196229457855
Validation loss = 0.4621184766292572
Validation loss = 0.4657907783985138
Validation loss = 0.4789291322231293
Validation loss = 0.47246554493904114
Validation loss = 0.47654297947883606
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.46397972106933594
Validation loss = 0.463821142911911
Validation loss = 0.4668656587600708
Validation loss = 0.4738859236240387
Validation loss = 0.468296080827713
Validation loss = 0.4789501130580902
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.45306602120399475
Validation loss = 0.45196375250816345
Validation loss = 0.46298548579216003
Validation loss = 0.4547492265701294
Validation loss = 0.46173492074012756
Validation loss = 0.458048015832901
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4594062864780426
Validation loss = 0.44941627979278564
Validation loss = 0.4625752866268158
Validation loss = 0.4608825445175171
Validation loss = 0.44845208525657654
Validation loss = 0.4586968421936035
Validation loss = 0.4617423117160797
Validation loss = 0.4636375904083252
Validation loss = 0.46962010860443115
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.125
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1246684350132626
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12433862433862433
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12401055408970976
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12368421052631579
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12335958005249344
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12303664921465969
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1227154046997389
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12239583333333333
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12207792207792208
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12176165803108809
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12144702842377261
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1211340206185567
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12339331619537275
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12564102564102564
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12531969309462915
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.125
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12468193384223919
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12436548223350254
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1240506329113924
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12373737373737374
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12342569269521411
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12311557788944724
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12280701754385964
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1225
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -77      |
| Iteration     | 14       |
| MaximumReturn | -0.593   |
| MinimumReturn | -119     |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4449508488178253
Validation loss = 0.44383662939071655
Validation loss = 0.44947654008865356
Validation loss = 0.4574751853942871
Validation loss = 0.4500848352909088
Validation loss = 0.45856648683547974
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.46413519978523254
Validation loss = 0.44762760400772095
Validation loss = 0.46295198798179626
Validation loss = 0.4651927947998047
Validation loss = 0.4504876434803009
Validation loss = 0.4611975848674774
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.45747342705726624
Validation loss = 0.4582537114620209
Validation loss = 0.4649783670902252
Validation loss = 0.458919882774353
Validation loss = 0.45617392659187317
Validation loss = 0.4722795784473419
Validation loss = 0.4686071276664734
Validation loss = 0.47315528988838196
Validation loss = 0.47397246956825256
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4395039677619934
Validation loss = 0.44371822476387024
Validation loss = 0.4461204707622528
Validation loss = 0.4444751739501953
Validation loss = 0.4459894597530365
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4517366290092468
Validation loss = 0.4494934380054474
Validation loss = 0.45048093795776367
Validation loss = 0.4649774432182312
Validation loss = 0.4716535210609436
Validation loss = 0.45378780364990234
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12219451371571072
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12189054726368159
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12158808933002481
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12128712871287128
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12098765432098765
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1206896551724138
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12039312039312039
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12009803921568628
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1198044009779951
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11951219512195121
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1192214111922141
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11893203883495146
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11864406779661017
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11835748792270531
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1180722891566265
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11778846153846154
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11750599520383694
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11722488038277512
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11694510739856802
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11666666666666667
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1163895486935867
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11611374407582939
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.12056737588652482
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12264150943396226
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1223529411764706
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -43.4    |
| Iteration     | 15       |
| MaximumReturn | -0.0176  |
| MinimumReturn | -99.3    |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.46362146735191345
Validation loss = 0.4560975134372711
Validation loss = 0.4674619138240814
Validation loss = 0.46374720335006714
Validation loss = 0.46314606070518494
Validation loss = 0.46391502022743225
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.46138712763786316
Validation loss = 0.4665023982524872
Validation loss = 0.46347683668136597
Validation loss = 0.4643688201904297
Validation loss = 0.47195783257484436
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.47330382466316223
Validation loss = 0.47347351908683777
Validation loss = 0.48077771067619324
Validation loss = 0.4769896864891052
Validation loss = 0.4835057258605957
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4613898694515228
Validation loss = 0.4527831971645355
Validation loss = 0.454365074634552
Validation loss = 0.4591626822948456
Validation loss = 0.46088171005249023
Validation loss = 0.4560668468475342
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4651491641998291
Validation loss = 0.457115113735199
Validation loss = 0.45940372347831726
Validation loss = 0.47817036509513855
Validation loss = 0.45724043250083923
Validation loss = 0.4636409282684326
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12206572769953052
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12177985948477751
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12149532710280374
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12354312354312354
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12558139534883722
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12529002320185614
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.125
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12471131639722864
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12442396313364056
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12413793103448276
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12385321100917432
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12356979405034325
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1232876712328767
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1252847380410023
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.125
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12471655328798185
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1244343891402715
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12415349887133183
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12387387387387387
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12359550561797752
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.12780269058295965
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12751677852348994
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12946428571428573
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1291759465478842
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1288888888888889
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -23.9    |
| Iteration     | 16       |
| MaximumReturn | -0.0371  |
| MinimumReturn | -86.3    |
| TotalSamples  | 29988    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4636423885822296
Validation loss = 0.46704891324043274
Validation loss = 0.45691806077957153
Validation loss = 0.46314623951911926
Validation loss = 0.46497073769569397
Validation loss = 0.4674523174762726
Validation loss = 0.47454899549484253
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4677335321903229
Validation loss = 0.4708695411682129
Validation loss = 0.46154549717903137
Validation loss = 0.4696425497531891
Validation loss = 0.473453551530838
Validation loss = 0.4809586703777313
Validation loss = 0.4726676642894745
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.48451676964759827
Validation loss = 0.46753671765327454
Validation loss = 0.488930881023407
Validation loss = 0.4792594313621521
Validation loss = 0.479690819978714
Validation loss = 0.486994206905365
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.46493688225746155
Validation loss = 0.4506753981113434
Validation loss = 0.45806947350502014
Validation loss = 0.4529627859592438
Validation loss = 0.46668949723243713
Validation loss = 0.4632754921913147
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.47684821486473083
Validation loss = 0.47338560223579407
Validation loss = 0.46885064244270325
Validation loss = 0.46626389026641846
Validation loss = 0.4663920998573303
Validation loss = 0.47089943289756775
Validation loss = 0.4722006916999817
Validation loss = 0.4787744879722595
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.13303769401330376
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.13495575221238937
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1346578366445916
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1343612334801762
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13406593406593406
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.13596491228070176
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.13785557986870897
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.14192139737991266
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1437908496732026
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14347826086956522
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14316702819956617
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14285714285714285
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14254859611231102
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14224137931034483
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14193548387096774
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14163090128755365
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14132762312633834
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14102564102564102
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14072494669509594
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14042553191489363
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14012738853503184
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13983050847457626
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13953488372093023
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.14556962025316456
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.14736842105263157
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.46    |
| Iteration     | 17       |
| MaximumReturn | -0.0294  |
| MinimumReturn | -38.6    |
| TotalSamples  | 31654    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.47453275322914124
Validation loss = 0.4784078299999237
Validation loss = 0.4787631332874298
Validation loss = 0.4801476299762726
Validation loss = 0.47804078459739685
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.479059100151062
Validation loss = 0.47498494386672974
Validation loss = 0.4740713834762573
Validation loss = 0.4801900088787079
Validation loss = 0.49239498376846313
Validation loss = 0.4806436002254486
Validation loss = 0.48386847972869873
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.48737600445747375
Validation loss = 0.4857397973537445
Validation loss = 0.49894917011260986
Validation loss = 0.4921577274799347
Validation loss = 0.5021936893463135
Validation loss = 0.5008893609046936
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4611397385597229
Validation loss = 0.46914008259773254
Validation loss = 0.4658069312572479
Validation loss = 0.47327759861946106
Validation loss = 0.47959810495376587
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.47064918279647827
Validation loss = 0.47853216528892517
Validation loss = 0.48188966512680054
Validation loss = 0.4864872395992279
Validation loss = 0.4897346496582031
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14705882352941177
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14675052410901468
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.15271966527196654
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1524008350730689
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15208333333333332
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15176715176715178
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15145228215767634
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15113871635610765
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15289256198347106
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15257731958762888
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1522633744855967
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1540041067761807
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15368852459016394
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1554192229038855
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15510204081632653
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15682281059063136
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1565040650406504
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15618661257606492
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15587044534412955
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15555555555555556
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15524193548387097
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15694164989939638
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1566265060240964
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.156312625250501
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.156
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -23.3    |
| Iteration     | 18       |
| MaximumReturn | -0.0318  |
| MinimumReturn | -84.9    |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.47244784235954285
Validation loss = 0.47496920824050903
Validation loss = 0.469948947429657
Validation loss = 0.47848862409591675
Validation loss = 0.48264414072036743
Validation loss = 0.48340874910354614
Validation loss = 0.4816979467868805
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4781734347343445
Validation loss = 0.4819144606590271
Validation loss = 0.4878787100315094
Validation loss = 0.4888495206832886
Validation loss = 0.4911736845970154
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4985925555229187
Validation loss = 0.4918668568134308
Validation loss = 0.49269336462020874
Validation loss = 0.5090259313583374
Validation loss = 0.5000402927398682
Validation loss = 0.4926450848579407
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.46931660175323486
Validation loss = 0.4645639955997467
Validation loss = 0.4728788733482361
Validation loss = 0.4795200824737549
Validation loss = 0.4815387725830078
Validation loss = 0.4751042127609253
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4835473895072937
Validation loss = 0.47911345958709717
Validation loss = 0.48677608370780945
Validation loss = 0.4838728606700897
Validation loss = 0.48624253273010254
Validation loss = 0.49183332920074463
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15568862275449102
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1553784860557769
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.1610337972166998
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16071428571428573
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1603960396039604
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1600790513833992
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16173570019723865
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16141732283464566
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16110019646365423
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.16470588235294117
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1643835616438356
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1640625
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16374269005847952
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16342412451361868
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16310679611650486
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16279069767441862
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16247582205029013
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16216216216216217
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.16570327552986513
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16538461538461538
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16506717850287908
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16475095785440613
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16443594646271512
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16412213740458015
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16380952380952382
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.52    |
| Iteration     | 19       |
| MaximumReturn | -0.0173  |
| MinimumReturn | -25.7    |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.49091315269470215
Validation loss = 0.4877064526081085
Validation loss = 0.47938308119773865
Validation loss = 0.49551308155059814
Validation loss = 0.4887285828590393
Validation loss = 0.4892696440219879
Validation loss = 0.502466082572937
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.49521660804748535
Validation loss = 0.4887610077857971
Validation loss = 0.4968538284301758
Validation loss = 0.48946914076805115
Validation loss = 0.4957922697067261
Validation loss = 0.4985607862472534
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.48833656311035156
Validation loss = 0.4957699477672577
Validation loss = 0.494186669588089
Validation loss = 0.49565383791923523
Validation loss = 0.5109445452690125
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.469798743724823
Validation loss = 0.47502371668815613
Validation loss = 0.47506117820739746
Validation loss = 0.47484666109085083
Validation loss = 0.4783209562301636
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4814301133155823
Validation loss = 0.49257081747055054
Validation loss = 0.4878241717815399
Validation loss = 0.48919183015823364
Validation loss = 0.4975631833076477
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16539923954372623
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1650853889943074
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16477272727272727
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.16824196597353497
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16792452830188678
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16760828625235405
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16729323308270677
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.17073170731707318
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.17415730337078653
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17383177570093458
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17350746268656717
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17318435754189945
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17286245353159851
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1725417439703154
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.17592592592592593
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1756007393715342
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1752767527675277
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17495395948434622
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17463235294117646
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1761467889908257
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17582417582417584
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17550274223034734
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.177007299270073
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1766848816029144
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17636363636363636
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -35      |
| Iteration     | 20       |
| MaximumReturn | -0.0838  |
| MinimumReturn | -70.8    |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.48518022894859314
Validation loss = 0.4851335883140564
Validation loss = 0.4872352182865143
Validation loss = 0.4951012432575226
Validation loss = 0.4961458742618561
Validation loss = 0.4965347647666931
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4813089370727539
Validation loss = 0.48715829849243164
Validation loss = 0.495047926902771
Validation loss = 0.49393346905708313
Validation loss = 0.4967387616634369
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4935380518436432
Validation loss = 0.4940711259841919
Validation loss = 0.48964083194732666
Validation loss = 0.49565473198890686
Validation loss = 0.4975385069847107
Validation loss = 0.5017655491828918
Validation loss = 0.49930769205093384
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4713951349258423
Validation loss = 0.47447142004966736
Validation loss = 0.47483232617378235
Validation loss = 0.47368064522743225
Validation loss = 0.47531116008758545
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4860340654850006
Validation loss = 0.48705869913101196
Validation loss = 0.4805864095687866
Validation loss = 0.4919121563434601
Validation loss = 0.49740689992904663
Validation loss = 0.4913247525691986
Validation loss = 0.4900789260864258
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17604355716878403
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17572463768115942
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.17902350813743217
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.18231046931407943
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.18558558558558558
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18525179856115107
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18491921005385997
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.18996415770609318
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18962432915921287
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18928571428571428
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1889483065953654
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18861209964412812
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.19182948490230906
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19148936170212766
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.20176991150442478
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2049469964664311
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.21164021164021163
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.22007042253521128
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21968365553602812
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.22456140350877193
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22416812609457093
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22377622377622378
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.23036649214659685
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22996515679442509
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.23478260869565218
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.27    |
| Iteration     | 21       |
| MaximumReturn | -0.0369  |
| MinimumReturn | -20.4    |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.48907750844955444
Validation loss = 0.493919312953949
Validation loss = 0.4914957284927368
Validation loss = 0.4901093542575836
Validation loss = 0.4963951110839844
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.48182398080825806
Validation loss = 0.479148805141449
Validation loss = 0.48640134930610657
Validation loss = 0.4904017150402069
Validation loss = 0.4937306344509125
Validation loss = 0.4906681478023529
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.49861598014831543
Validation loss = 0.500934362411499
Validation loss = 0.49789273738861084
Validation loss = 0.5128257870674133
Validation loss = 0.5088613033294678
Validation loss = 0.5140438675880432
Validation loss = 0.5075456500053406
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.481397420167923
Validation loss = 0.47810012102127075
Validation loss = 0.4817289113998413
Validation loss = 0.48542821407318115
Validation loss = 0.4882589280605316
Validation loss = 0.48716235160827637
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.49155938625335693
Validation loss = 0.48911023139953613
Validation loss = 0.4871119558811188
Validation loss = 0.4939163625240326
Validation loss = 0.49584609270095825
Validation loss = 0.5094885230064392
Validation loss = 0.5054894089698792
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2361111111111111
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23743500866551126
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2370242214532872
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23661485319516407
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23620689655172414
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.23924268502581755
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23883161512027493
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23842195540308747
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.238013698630137
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2376068376068376
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23720136518771331
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23679727427597955
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23639455782312926
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23769100169779286
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.24067796610169492
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24027072758037224
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23986486486486486
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23946037099494097
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23905723905723905
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23865546218487396
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.24161073825503357
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24120603015075376
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.24414715719063546
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24373956594323873
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24333333333333335
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -139     |
| Iteration     | 22       |
| MaximumReturn | -90.9    |
| MinimumReturn | -168     |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.48617610335350037
Validation loss = 0.49549442529678345
Validation loss = 0.49084901809692383
Validation loss = 0.4936864972114563
Validation loss = 0.4971214234828949
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.48731088638305664
Validation loss = 0.4854520857334137
Validation loss = 0.4967413544654846
Validation loss = 0.49663233757019043
Validation loss = 0.4986110329627991
Validation loss = 0.5004222989082336
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.49311986565589905
Validation loss = 0.5066326260566711
Validation loss = 0.5064272880554199
Validation loss = 0.5072993040084839
Validation loss = 0.5114734768867493
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.47062206268310547
Validation loss = 0.4841758608818054
Validation loss = 0.4819509983062744
Validation loss = 0.4867808222770691
Validation loss = 0.48752865195274353
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5047702789306641
Validation loss = 0.4971718192100525
Validation loss = 0.5097576379776001
Validation loss = 0.5074783563613892
Validation loss = 0.5162646174430847
Validation loss = 0.5187242031097412
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24292845257903495
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2425249169435216
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24212271973466004
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24172185430463577
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2413223140495868
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24092409240924093
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24052718286655683
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24013157894736842
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23973727422003285
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23934426229508196
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.24222585924713586
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24183006535947713
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24306688417618272
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24267100977198697
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24227642276422764
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2418831168831169
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24311183144246354
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24271844660194175
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24232633279483037
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24193548387096775
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24154589371980675
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24115755627009647
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24077046548956663
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2403846153846154
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2432
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -147     |
| Iteration     | 23       |
| MaximumReturn | -108     |
| MinimumReturn | -170     |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.49115878343582153
Validation loss = 0.4857484698295593
Validation loss = 0.49839872121810913
Validation loss = 0.5034850239753723
Validation loss = 0.5020762085914612
Validation loss = 0.508569598197937
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4830995202064514
Validation loss = 0.4901534616947174
Validation loss = 0.4911631941795349
Validation loss = 0.4965128004550934
Validation loss = 0.5036515593528748
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5007935762405396
Validation loss = 0.5053492784500122
Validation loss = 0.4997507631778717
Validation loss = 0.5112587809562683
Validation loss = 0.514877438545227
Validation loss = 0.5160229206085205
Validation loss = 0.5154410600662231
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4758303761482239
Validation loss = 0.48116493225097656
Validation loss = 0.48725780844688416
Validation loss = 0.48762384057044983
Validation loss = 0.4917932450771332
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4888675808906555
Validation loss = 0.501172661781311
Validation loss = 0.50425785779953
Validation loss = 0.5039629936218262
Validation loss = 0.5143703818321228
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24281150159744408
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24242424242424243
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24203821656050956
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24165341812400637
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24126984126984127
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24088748019017434
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24050632911392406
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2401263823064771
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.2444794952681388
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2440944881889764
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24371069182389937
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24332810047095763
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24294670846394983
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24256651017214398
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2421875
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24336973478939158
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24299065420560748
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24261275272161742
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2422360248447205
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.24806201550387597
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2476780185758514
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2472952086553323
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24691358024691357
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2465331278890601
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24615384615384617
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -115     |
| Iteration     | 24       |
| MaximumReturn | -21.4    |
| MinimumReturn | -161     |
| TotalSamples  | 43316    |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.49075621366500854
Validation loss = 0.49897313117980957
Validation loss = 0.4977141320705414
Validation loss = 0.5051249265670776
Validation loss = 0.4987199306488037
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.48363354802131653
Validation loss = 0.488421767950058
Validation loss = 0.5038209557533264
Validation loss = 0.4998719394207001
Validation loss = 0.4942878186702728
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.49683815240859985
Validation loss = 0.5033208131790161
Validation loss = 0.5150505900382996
Validation loss = 0.5042681694030762
Validation loss = 0.5121942758560181
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.48103728890419006
Validation loss = 0.4808206558227539
Validation loss = 0.4877678453922272
Validation loss = 0.48881858587265015
Validation loss = 0.4976906478404999
Validation loss = 0.4952884316444397
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.49319231510162354
Validation loss = 0.4999881982803345
Validation loss = 0.4956827759742737
Validation loss = 0.5006625652313232
Validation loss = 0.5098703503608704
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2457757296466974
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2469325153374233
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24655436447166923
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.25840978593272174
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2580152671755725
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2576219512195122
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2602739726027397
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25987841945288753
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25948406676783003
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2590909090909091
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2586989409984871
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2583081570996979
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2579185520361991
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2605421686746988
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2631578947368421
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2627627627627628
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.26686656671664166
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26646706586826346
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26606875934230195
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2656716417910448
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2667660208643815
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 12
average number of affinization = 0.28422619047619047
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2838038632986627
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.28338278931750743
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.28444444444444444
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.09    |
| Iteration     | 25       |
| MaximumReturn | -0.139   |
| MinimumReturn | -57.8    |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4889267086982727
Validation loss = 0.4927870035171509
Validation loss = 0.49870431423187256
Validation loss = 0.5021876096725464
Validation loss = 0.5031771659851074
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4897153079509735
Validation loss = 0.4964374303817749
Validation loss = 0.49083277583122253
Validation loss = 0.49782493710517883
Validation loss = 0.4976702034473419
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5050032734870911
Validation loss = 0.5130831599235535
Validation loss = 0.5088995695114136
Validation loss = 0.5080993175506592
Validation loss = 0.5163130164146423
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4813375771045685
Validation loss = 0.4827149212360382
Validation loss = 0.49026060104370117
Validation loss = 0.493274986743927
Validation loss = 0.49314576387405396
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4976472854614258
Validation loss = 0.49852171540260315
Validation loss = 0.49721935391426086
Validation loss = 0.5024511218070984
Validation loss = 0.5093771815299988
Validation loss = 0.5185759663581848
Validation loss = 0.5148274302482605
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.28550295857988167
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.28508124076809455
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.28466076696165193
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.28718703976435933
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.2926470588235294
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2922173274596182
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.29472140762463345
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29428989751098095
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29385964912280704
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2934306569343066
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29300291545189505
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2925764192139738
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2921511627906977
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29172714078374457
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29130434782608694
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.29232995658465993
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29190751445086704
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29148629148629146
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2910662824207493
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2906474820143885
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29022988505747127
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2898134863701578
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.28939828080229224
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.28898426323319026
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2885714285714286
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -8.14    |
| Iteration     | 26       |
| MaximumReturn | -0.0949  |
| MinimumReturn | -102     |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5024539828300476
Validation loss = 0.49475985765457153
Validation loss = 0.49507033824920654
Validation loss = 0.5069354772567749
Validation loss = 0.5118691325187683
Validation loss = 0.5150032639503479
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4925316870212555
Validation loss = 0.49569910764694214
Validation loss = 0.5029988288879395
Validation loss = 0.5030242204666138
Validation loss = 0.5032181739807129
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.507144033908844
Validation loss = 0.5060932040214539
Validation loss = 0.5169248580932617
Validation loss = 0.5110011100769043
Validation loss = 0.5221984386444092
Validation loss = 0.5254013538360596
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.49260279536247253
Validation loss = 0.4944450259208679
Validation loss = 0.48758381605148315
Validation loss = 0.49487972259521484
Validation loss = 0.5005767941474915
Validation loss = 0.5001864433288574
Validation loss = 0.5084802508354187
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5147174000740051
Validation loss = 0.509271502494812
Validation loss = 0.5222839713096619
Validation loss = 0.5158337354660034
Validation loss = 0.5175037980079651
Validation loss = 0.5220280289649963
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.28815977175463625
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2905982905982906
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2901849217638691
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.29261363636363635
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29219858156028367
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29178470254957506
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2913719943422914
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2909604519774011
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2933709449929478
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29295774647887324
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29254571026722925
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29213483146067415
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2917251051893408
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2913165266106443
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2923076923076923
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2918994413407821
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2914923291492329
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29108635097493035
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2906815020862309
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2902777777777778
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.289875173370319
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.29085872576177285
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29045643153526973
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2900552486187845
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2896551724137931
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -6.51    |
| Iteration     | 27       |
| MaximumReturn | -0.223   |
| MinimumReturn | -96      |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5007613301277161
Validation loss = 0.5023146271705627
Validation loss = 0.5067502856254578
Validation loss = 0.5141927599906921
Validation loss = 0.5270936489105225
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.496469646692276
Validation loss = 0.5039799809455872
Validation loss = 0.5077149868011475
Validation loss = 0.5067104697227478
Validation loss = 0.5165372490882874
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5084554553031921
Validation loss = 0.5160196423530579
Validation loss = 0.5158136487007141
Validation loss = 0.5168137550354004
Validation loss = 0.5200673341751099
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5076170563697815
Validation loss = 0.5071927905082703
Validation loss = 0.5051272511482239
Validation loss = 0.5053513050079346
Validation loss = 0.514828622341156
Validation loss = 0.5125600099563599
Validation loss = 0.5144957900047302
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5115723013877869
Validation loss = 0.5200632214546204
Validation loss = 0.5138177871704102
Validation loss = 0.5195593237876892
Validation loss = 0.5220690369606018
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.290633608815427
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2902338376891334
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.28983516483516486
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.289437585733882
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.29315068493150687
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.292749658002736
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2937158469945355
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2933151432469304
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29291553133514986
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2925170068027211
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2921195652173913
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29172320217096337
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29132791327913277
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29093369418132614
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.29324324324324325
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29284750337381915
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29245283018867924
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2920592193808883
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.29301075268817206
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29261744966442954
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29222520107238603
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2918340026773762
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2914438502673797
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2910547396528705
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2906666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.454   |
| Iteration     | 28       |
| MaximumReturn | -0.089   |
| MinimumReturn | -1.28    |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5034183263778687
Validation loss = 0.5108713507652283
Validation loss = 0.5064759850502014
Validation loss = 0.5067756175994873
Validation loss = 0.5145142078399658
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5040323734283447
Validation loss = 0.507659912109375
Validation loss = 0.5116981267929077
Validation loss = 0.5129271745681763
Validation loss = 0.5093606114387512
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5149433016777039
Validation loss = 0.5122028589248657
Validation loss = 0.5180153846740723
Validation loss = 0.5253956317901611
Validation loss = 0.5246323943138123
Validation loss = 0.5214407444000244
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5070648789405823
Validation loss = 0.5044183135032654
Validation loss = 0.5131454467773438
Validation loss = 0.5137420296669006
Validation loss = 0.5140541195869446
Validation loss = 0.5142807960510254
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5084888935089111
Validation loss = 0.5181066393852234
Validation loss = 0.5150787830352783
Validation loss = 0.517067015171051
Validation loss = 0.5197136402130127
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2929427430093209
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2925531914893617
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2921646746347942
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2917771883289125
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.29271523178807946
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.294973544973545
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.29590488771466317
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.29683377308707126
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.29907773386034253
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2986842105263158
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2996057818659658
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3005249343832021
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.30275229357798167
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3023560209424084
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3032679738562091
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3028720626631854
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.30638852672750977
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3059895833333333
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3081924577373212
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.3142857142857143
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3151750972762646
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3173575129533679
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.31953428201811124
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.32170542635658916
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.32387096774193547
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -25.6    |
| Iteration     | 29       |
| MaximumReturn | -0.0465  |
| MinimumReturn | -75.5    |
| TotalSamples  | 51646    |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.506679117679596
Validation loss = 0.5128644108772278
Validation loss = 0.5168781876564026
Validation loss = 0.5243331789970398
Validation loss = 0.5127460360527039
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5020793080329895
Validation loss = 0.5107370615005493
Validation loss = 0.5108509659767151
Validation loss = 0.5189757347106934
Validation loss = 0.5190585255622864
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5219348669052124
Validation loss = 0.5212641358375549
Validation loss = 0.5256258845329285
Validation loss = 0.5243603587150574
Validation loss = 0.5300710797309875
Validation loss = 0.5261732339859009
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5155149102210999
Validation loss = 0.5089476704597473
Validation loss = 0.5235629081726074
Validation loss = 0.5156919360160828
Validation loss = 0.5192760229110718
Validation loss = 0.5125516653060913
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.517794132232666
Validation loss = 0.512698769569397
Validation loss = 0.5170640349388123
Validation loss = 0.5283458828926086
Validation loss = 0.5243154168128967
Validation loss = 0.5199512243270874
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3234536082474227
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.323037323037323
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.32390745501285345
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32349165596919127
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3230769230769231
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.323943661971831
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3235294117647059
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3231162196679438
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3227040816326531
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3248407643312102
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3244274809160305
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3240152477763659
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3236040609137056
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3231939163498099
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3227848101265823
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3223767383059418
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32196969696969696
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32156368221941994
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3211586901763224
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32075471698113206
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3203517587939699
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3199498117942284
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.31954887218045114
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.32290362953692114
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3225
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -51.1    |
| Iteration     | 30       |
| MaximumReturn | -0.302   |
| MinimumReturn | -118     |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5148922204971313
Validation loss = 0.5063274502754211
Validation loss = 0.5144240260124207
Validation loss = 0.5173777341842651
Validation loss = 0.5193799734115601
Validation loss = 0.5162075757980347
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5014849305152893
Validation loss = 0.5074300169944763
Validation loss = 0.5179411768913269
Validation loss = 0.5147467255592346
Validation loss = 0.5150797963142395
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5201994776725769
Validation loss = 0.5226660966873169
Validation loss = 0.5237870812416077
Validation loss = 0.5314496755599976
Validation loss = 0.5303957462310791
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5086330771446228
Validation loss = 0.5116801261901855
Validation loss = 0.5132397413253784
Validation loss = 0.5137361288070679
Validation loss = 0.5227137207984924
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5256146192550659
Validation loss = 0.5203670859336853
Validation loss = 0.5232813358306885
Validation loss = 0.5268192887306213
Validation loss = 0.5260549783706665
Validation loss = 0.5350967049598694
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.32459425717852686
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32418952618453867
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.32503113325031135
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3271144278606965
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3267080745341615
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.32878411910669975
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32837670384138784
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.33044554455445546
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.33127317676143386
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.337037037037037
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3366214549938348
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.3399014778325123
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3419434194341943
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.343980343980344
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.34355828220858897
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3431372549019608
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.35006119951040393
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.34963325183374083
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3504273504273504
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.3548780487804878
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.35444579780755175
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.36009732360097324
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3596597812879708
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3592233009708738
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3612121212121212
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -19.1    |
| Iteration     | 31       |
| MaximumReturn | -0.139   |
| MinimumReturn | -89.1    |
| TotalSamples  | 54978    |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5118788480758667
Validation loss = 0.5112255811691284
Validation loss = 0.5258406400680542
Validation loss = 0.5175177454948425
Validation loss = 0.5204532742500305
Validation loss = 0.5201408267021179
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5112910866737366
Validation loss = 0.5049571394920349
Validation loss = 0.5130476355552673
Validation loss = 0.5163801908493042
Validation loss = 0.5263695120811462
Validation loss = 0.5224408507347107
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5247946977615356
Validation loss = 0.5223221778869629
Validation loss = 0.5271751284599304
Validation loss = 0.5261908173561096
Validation loss = 0.5276155471801758
Validation loss = 0.5393695831298828
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5194519758224487
Validation loss = 0.5133452415466309
Validation loss = 0.5138348937034607
Validation loss = 0.5184382796287537
Validation loss = 0.5216068625450134
Validation loss = 0.520413875579834
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.522510290145874
Validation loss = 0.5244933366775513
Validation loss = 0.5256900191307068
Validation loss = 0.5304507613182068
Validation loss = 0.5346842408180237
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.36803874092009686
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.37122128174123337
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.37318840579710144
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.37273823884197826
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.372289156626506
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.37184115523465705
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3713942307692308
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3709483793517407
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.37050359712230213
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3724550898203593
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.37200956937799046
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3715651135005974
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3711217183770883
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.3742550655542312
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 12
average number of affinization = 0.3880952380952381
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3876337693222354
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.38836104513064135
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3879003558718861
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.3909952606635071
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3905325443786982
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 10
average number of affinization = 0.40189125295508277
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.40613931523022434
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4056603773584906
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.40518256772673733
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4047058823529412
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.08    |
| Iteration     | 32       |
| MaximumReturn | -0.156   |
| MinimumReturn | -36.8    |
| TotalSamples  | 56644    |
----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5185742378234863
Validation loss = 0.5207633376121521
Validation loss = 0.5206311941146851
Validation loss = 0.5220783352851868
Validation loss = 0.5290037393569946
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5227590799331665
Validation loss = 0.5218325257301331
Validation loss = 0.5183879137039185
Validation loss = 0.5280137062072754
Validation loss = 0.5259116888046265
Validation loss = 0.5256877541542053
Validation loss = 0.5274121761322021
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.523798942565918
Validation loss = 0.5324187874794006
Validation loss = 0.5285731554031372
Validation loss = 0.535753071308136
Validation loss = 0.5391212701797485
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5293022990226746
Validation loss = 0.5225595235824585
Validation loss = 0.5212777256965637
Validation loss = 0.5287431478500366
Validation loss = 0.5252224802970886
Validation loss = 0.5259774327278137
Validation loss = 0.5314429998397827
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5299625396728516
Validation loss = 0.5392429828643799
Validation loss = 0.5411719679832458
Validation loss = 0.5314650535583496
Validation loss = 0.5309864282608032
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.40658049353701525
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4072769953051643
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.41031652989449
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.4180327868852459
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.4233918128654971
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.4310747663551402
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4329054842473746
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4347319347319347
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.43888242142025613
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4383720930232558
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.4436701509872242
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.4466357308584687
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4461181923522596
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.45023148148148145
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4508670520231214
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.45496535796766746
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.45790080738177624
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.45852534562211983
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4579976985040276
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.45977011494252873
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.4626865671641791
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46215596330275227
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.46964490263459335
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4702517162471396
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4697142857142857
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -34.1    |
| Iteration     | 33       |
| MaximumReturn | -0.191   |
| MinimumReturn | -107     |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5248888731002808
Validation loss = 0.524790346622467
Validation loss = 0.529071033000946
Validation loss = 0.5247406959533691
Validation loss = 0.5303358435630798
Validation loss = 0.5336471199989319
Validation loss = 0.5305958390235901
Validation loss = 0.5429672598838806
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.534432053565979
Validation loss = 0.5234085917472839
Validation loss = 0.5267704129219055
Validation loss = 0.5337262749671936
Validation loss = 0.5351648926734924
Validation loss = 0.5396457314491272
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5298570990562439
Validation loss = 0.530524492263794
Validation loss = 0.5370652675628662
Validation loss = 0.5375415086746216
Validation loss = 0.5406752228736877
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5378849506378174
Validation loss = 0.5354101061820984
Validation loss = 0.5305570363998413
Validation loss = 0.5316056609153748
Validation loss = 0.5357517600059509
Validation loss = 0.533182680606842
Validation loss = 0.5354675650596619
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5245097279548645
Validation loss = 0.5300086140632629
Validation loss = 0.5309498906135559
Validation loss = 0.5335366725921631
Validation loss = 0.5351251363754272
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4691780821917808
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46864310148232613
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46810933940774485
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.4732650739476678
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4727272727272727
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.47559591373439275
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.47845804988662133
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.47904869762174407
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47850678733031676
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47796610169491527
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4785553047404063
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47801578354002255
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4774774774774775
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4769403824521935
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.4797752808988764
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4792368125701459
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4798206278026906
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47928331466965285
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47874720357941836
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4782122905027933
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47767857142857145
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47714604236343366
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.47884187082405344
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.48053392658509453
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.48333333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -75.5    |
| Iteration     | 34       |
| MaximumReturn | -0.237   |
| MinimumReturn | -142     |
| TotalSamples  | 59976    |
----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5268638730049133
Validation loss = 0.5289925336837769
Validation loss = 0.5354352593421936
Validation loss = 0.5329754948616028
Validation loss = 0.5440788865089417
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5262677073478699
Validation loss = 0.5300660729408264
Validation loss = 0.5384979844093323
Validation loss = 0.5370659232139587
Validation loss = 0.5425361394882202
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5342749953269958
Validation loss = 0.5380823612213135
Validation loss = 0.536407470703125
Validation loss = 0.5367799997329712
Validation loss = 0.5488666892051697
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5390924215316772
Validation loss = 0.5350391864776611
Validation loss = 0.5331458449363708
Validation loss = 0.5401014685630798
Validation loss = 0.538230836391449
Validation loss = 0.5405451059341431
Validation loss = 0.5430744886398315
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5326217412948608
Validation loss = 0.529849648475647
Validation loss = 0.5326054096221924
Validation loss = 0.5385793447494507
Validation loss = 0.5441187620162964
Validation loss = 0.5416962504386902
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.48501664816870144
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4844789356984479
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 12
average number of affinization = 0.49723145071982283
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49668141592920356
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4972375690607735
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4988962472406181
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4994487320837927
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.501101321585903
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5005500550055005
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.5054945054945055
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.5115257958287596
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 9
average number of affinization = 0.5208333333333334
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5224534501642936
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5218818380743983
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.526775956284153
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.5316593886462883
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5310796074154853
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5337690631808278
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.5386289445048966
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5402173913043479
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5396308360477742
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.5466377440347071
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.5525460455037919
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.551948051948052
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5556756756756757
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -13.1    |
| Iteration     | 35       |
| MaximumReturn | -0.102   |
| MinimumReturn | -84.3    |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5339804887771606
Validation loss = 0.534155547618866
Validation loss = 0.5311175584793091
Validation loss = 0.5418096780776978
Validation loss = 0.5373764038085938
Validation loss = 0.5459161996841431
Validation loss = 0.5478317737579346
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5307448506355286
Validation loss = 0.5380325317382812
Validation loss = 0.5295133590698242
Validation loss = 0.5374801158905029
Validation loss = 0.5360350012779236
Validation loss = 0.5400075912475586
Validation loss = 0.5415809154510498
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5342627167701721
Validation loss = 0.5374876856803894
Validation loss = 0.5371982455253601
Validation loss = 0.541678249835968
Validation loss = 0.5488635301589966
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5368914008140564
Validation loss = 0.5377357006072998
Validation loss = 0.5415424704551697
Validation loss = 0.5442792773246765
Validation loss = 0.5416401028633118
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5346229672431946
Validation loss = 0.5290305614471436
Validation loss = 0.5371944308280945
Validation loss = 0.5460116267204285
Validation loss = 0.5374475121498108
Validation loss = 0.5483124852180481
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 10
average number of affinization = 0.5658747300215983
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5652642934196332
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5689655172413793
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5694294940796556
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5698924731182796
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.569280343716434
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5697424892703863
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5702036441586281
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5717344753747323
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5711229946524065
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5715811965811965
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.575240128068303
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5767590618336887
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.577209797657082
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5776595744680851
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5770456960680127
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5764331210191083
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5768822905620361
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5773305084745762
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5798941798941799
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.580338266384778
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5818373812038015
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.5896624472573839
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.5943097997892518
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5936842105263158
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -66.3    |
| Iteration     | 36       |
| MaximumReturn | -0.72    |
| MinimumReturn | -177     |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5325108170509338
Validation loss = 0.5368655920028687
Validation loss = 0.5412372946739197
Validation loss = 0.5413762927055359
Validation loss = 0.544601321220398
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5374882221221924
Validation loss = 0.5388627052307129
Validation loss = 0.5409884452819824
Validation loss = 0.544903039932251
Validation loss = 0.548885703086853
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5437928438186646
Validation loss = 0.5392383337020874
Validation loss = 0.5422263741493225
Validation loss = 0.5513681769371033
Validation loss = 0.5490576028823853
Validation loss = 0.5496875643730164
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.544273316860199
Validation loss = 0.5426329970359802
Validation loss = 0.5448605418205261
Validation loss = 0.5473636388778687
Validation loss = 0.5469837188720703
Validation loss = 0.55061936378479
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5454288125038147
Validation loss = 0.544356644153595
Validation loss = 0.5447110533714294
Validation loss = 0.545241117477417
Validation loss = 0.548492431640625
Validation loss = 0.5486764311790466
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.6004206098843323
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5997899159663865
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.602308499475341
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6037735849056604
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.606282722513089
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.608786610878661
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.6144200626959248
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 9
average number of affinization = 0.6231732776617954
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6225234619395204
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6239583333333333
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6253902185223725
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6288981288981289
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6292834890965732
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6286307053941909
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6300518134715026
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.629399585921325
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6318510858324715
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 11
average number of affinization = 0.6425619834710744
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6449948400412797
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6463917525773196
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.646755921730175
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6491769547325102
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.656731757451182
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6560574948665298
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6605128205128206
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -42.5    |
| Iteration     | 37       |
| MaximumReturn | -0.304   |
| MinimumReturn | -129     |
| TotalSamples  | 64974    |
----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5384557247161865
Validation loss = 0.5421863794326782
Validation loss = 0.5458835959434509
Validation loss = 0.544707715511322
Validation loss = 0.5451494455337524
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5439595580101013
Validation loss = 0.5417835712432861
Validation loss = 0.5420119166374207
Validation loss = 0.5462452173233032
Validation loss = 0.5557975769042969
Validation loss = 0.5525053143501282
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5452314615249634
Validation loss = 0.548378586769104
Validation loss = 0.5557785630226135
Validation loss = 0.5542206764221191
Validation loss = 0.5576186180114746
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5499957799911499
Validation loss = 0.5469293594360352
Validation loss = 0.5512806177139282
Validation loss = 0.5530185699462891
Validation loss = 0.5517966747283936
Validation loss = 0.5487279891967773
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5435529947280884
Validation loss = 0.5492299199104309
Validation loss = 0.5483559370040894
Validation loss = 0.5484042167663574
Validation loss = 0.5513930320739746
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6598360655737705
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6591606960081884
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6584867075664622
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6578140960163432
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.6653061224489796
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6646279306829765
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6639511201629328
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.669379450661241
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6707317073170732
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6730964467005076
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6744421906693712
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6737588652482269
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6730769230769231
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6723963599595552
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6717171717171717
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6740665993945509
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6733870967741935
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.6797583081570997
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6790744466800804
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6793969849246231
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.678714859437751
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6800401203610833
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6793587174348698
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6786786786786787
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.678
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -80.1    |
| Iteration     | 38       |
| MaximumReturn | -0.189   |
| MinimumReturn | -164     |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5407987236976624
Validation loss = 0.5440731048583984
Validation loss = 0.5461902618408203
Validation loss = 0.5454319715499878
Validation loss = 0.5540043115615845
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5540927052497864
Validation loss = 0.5448569655418396
Validation loss = 0.5449645519256592
Validation loss = 0.5480608940124512
Validation loss = 0.551596999168396
Validation loss = 0.5601128935813904
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5518501996994019
Validation loss = 0.5453363656997681
Validation loss = 0.5494555234909058
Validation loss = 0.5603175163269043
Validation loss = 0.5521135330200195
Validation loss = 0.5495405793190002
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5477019548416138
Validation loss = 0.5567428469657898
Validation loss = 0.5540165901184082
Validation loss = 0.5505669116973877
Validation loss = 0.5650789737701416
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5450773239135742
Validation loss = 0.5473184585571289
Validation loss = 0.5521869659423828
Validation loss = 0.5551594495773315
Validation loss = 0.5523844361305237
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6773226773226774
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6806387225548902
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6799601196410767
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.6852589641434262
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6845771144278607
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6888667992047713
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6931479642502483
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.6994047619047619
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6987115956392468
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.698019801980198
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6993076162215628
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6986166007905138
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.702862783810464
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.7071005917159763
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7064039408866996
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.7106299212598425
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.7109144542772862
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7102161100196464
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.7163886162904809
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7156862745098039
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.7198824681684622
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.7240704500978473
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7272727272727273
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7265625
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 10
average number of affinization = 0.735609756097561
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.59    |
| Iteration     | 39       |
| MaximumReturn | -0.304   |
| MinimumReturn | -9.23    |
| TotalSamples  | 68306    |
----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5405232906341553
Validation loss = 0.5423120260238647
Validation loss = 0.547337532043457
Validation loss = 0.5448911786079407
Validation loss = 0.5514670610427856
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.549696683883667
Validation loss = 0.5446145534515381
Validation loss = 0.5532752275466919
Validation loss = 0.5554232001304626
Validation loss = 0.5574808716773987
Validation loss = 0.5546131730079651
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5467764139175415
Validation loss = 0.5507254600524902
Validation loss = 0.5490792989730835
Validation loss = 0.5539584159851074
Validation loss = 0.5563024282455444
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5505415797233582
Validation loss = 0.5472782850265503
Validation loss = 0.5586255788803101
Validation loss = 0.5541815161705017
Validation loss = 0.5577877163887024
Validation loss = 0.5592969059944153
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5520226955413818
Validation loss = 0.5544146299362183
Validation loss = 0.5533041954040527
Validation loss = 0.5552160143852234
Validation loss = 0.5582402944564819
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.7407407407407407
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7429406037000974
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7441634241245136
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7434402332361516
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7456310679611651
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7449078564500485
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.748062015503876
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.7483059051306873
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7495164410058027
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.7545893719806763
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7577220077220077
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7589199614271939
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.7649325626204239
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.7690086621751684
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.775
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7742555235350624
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7773512476007678
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7766059443911792
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7758620689655172
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7751196172248804
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7743785850860421
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 9
average number of affinization = 0.7822349570200573
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7814885496183206
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.784556720686368
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7838095238095238
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.51    |
| Iteration     | 40       |
| MaximumReturn | -0.109   |
| MinimumReturn | -31.5    |
| TotalSamples  | 69972    |
----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5457202196121216
Validation loss = 0.5535746216773987
Validation loss = 0.5493003129959106
Validation loss = 0.548563539981842
Validation loss = 0.5551460385322571
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5595191717147827
Validation loss = 0.5540808439254761
Validation loss = 0.5543832778930664
Validation loss = 0.5558978319168091
Validation loss = 0.5559232831001282
Validation loss = 0.5672115683555603
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5508798956871033
Validation loss = 0.5542204976081848
Validation loss = 0.5532013177871704
Validation loss = 0.5605934858322144
Validation loss = 0.5613634586334229
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5604537725448608
Validation loss = 0.5604857802391052
Validation loss = 0.551973283290863
Validation loss = 0.5528098344802856
Validation loss = 0.5577848553657532
Validation loss = 0.5653972625732422
Validation loss = 0.5670924782752991
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5464492440223694
Validation loss = 0.5516237020492554
Validation loss = 0.5557066202163696
Validation loss = 0.5548450350761414
Validation loss = 0.5579614639282227
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7830637488106565
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7842205323193916
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.785375118708452
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.7893738140417458
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7924170616113744
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7916666666666666
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.7975402081362346
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7967863894139886
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7998111425873465
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.8066037735849056
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8067860508953817
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8069679849340866
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.80620884289746
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8082706766917294
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8093896713615023
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.8151969981238274
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8163074039362699
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8183520599250936
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8203928905519177
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.822429906542056
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8263305322128851
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8302238805970149
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8331780055917987
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.835195530726257
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8372093023255814
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -22      |
| Iteration     | 41       |
| MaximumReturn | -0.116   |
| MinimumReturn | -88.8    |
| TotalSamples  | 71638    |
----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5543627738952637
Validation loss = 0.5583215951919556
Validation loss = 0.5487704277038574
Validation loss = 0.551813006401062
Validation loss = 0.5619913935661316
Validation loss = 0.5572258234024048
Validation loss = 0.5568970441818237
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5531563758850098
Validation loss = 0.5603944063186646
Validation loss = 0.555716872215271
Validation loss = 0.5585190057754517
Validation loss = 0.5596959590911865
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5522343516349792
Validation loss = 0.5571401119232178
Validation loss = 0.5528103709220886
Validation loss = 0.5629985928535461
Validation loss = 0.5594404339790344
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5624578595161438
Validation loss = 0.5550684928894043
Validation loss = 0.5675960779190063
Validation loss = 0.5646851658821106
Validation loss = 0.5615513324737549
Validation loss = 0.5622588992118835
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5546785593032837
Validation loss = 0.5575166940689087
Validation loss = 0.5586544871330261
Validation loss = 0.5582486987113953
Validation loss = 0.5548917651176453
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8401486988847584
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8430826369545033
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8469387755102041
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.8517145505097312
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8537037037037037
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.8593894542090657
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 15
average number of affinization = 0.8724584103512015
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8725761772853186
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8745387453874539
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.8792626728110599
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8793738489871087
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8813247470101196
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8814338235294118
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.8870523415977961
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8899082568807339
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8927589367552704
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.891941391941392
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8938700823421775
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8939670932358318
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8949771689497716
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.8996350364963503
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9006381039197813
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9034608378870674
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 11
average number of affinization = 0.9126478616924477
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9145454545454546
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.95    |
| Iteration     | 42       |
| MaximumReturn | -0.179   |
| MinimumReturn | -84.3    |
| TotalSamples  | 73304    |
----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5597968101501465
Validation loss = 0.5506016612052917
Validation loss = 0.5557830333709717
Validation loss = 0.5571563243865967
Validation loss = 0.5594015121459961
Validation loss = 0.5588729977607727
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5554112792015076
Validation loss = 0.551174521446228
Validation loss = 0.5600219964981079
Validation loss = 0.5594666600227356
Validation loss = 0.557099461555481
Validation loss = 0.5623526573181152
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5557525157928467
Validation loss = 0.555767297744751
Validation loss = 0.5647475719451904
Validation loss = 0.5574865937232971
Validation loss = 0.5633615255355835
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.556835412979126
Validation loss = 0.5592055916786194
Validation loss = 0.5616634488105774
Validation loss = 0.5623553991317749
Validation loss = 0.5687212347984314
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.561318039894104
Validation loss = 0.5570881366729736
Validation loss = 0.5630801320075989
Validation loss = 0.558591902256012
Validation loss = 0.5548683404922485
Validation loss = 0.5630236268043518
Validation loss = 0.5590166449546814
Validation loss = 0.5721457004547119
Validation loss = 0.5684788823127747
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.9191643960036331
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9192377495462795
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.9247506799637353
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.927536231884058
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9294117647058824
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9321880650994575
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 9
average number of affinization = 0.9394760614272809
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9422382671480144
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9440937781785392
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9468468468468468
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9495949594959496
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9496402877697842
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 11
average number of affinization = 0.958670260557053
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9596050269299821
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.9659192825112107
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9659498207885304
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9659803043867502
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9695885509838998
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9731903485254692
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9758928571428571
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9776984834968778
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9786096256684492
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9821905609973286
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9822064056939501
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9831111111111112
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -44.5    |
| Iteration     | 43       |
| MaximumReturn | -0.32    |
| MinimumReturn | -83.5    |
| TotalSamples  | 74970    |
----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5551770329475403
Validation loss = 0.5615794062614441
Validation loss = 0.5622444152832031
Validation loss = 0.5614346861839294
Validation loss = 0.5661392211914062
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5567708015441895
Validation loss = 0.561916708946228
Validation loss = 0.5614546537399292
Validation loss = 0.562656044960022
Validation loss = 0.560239315032959
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5579015612602234
Validation loss = 0.5546613335609436
Validation loss = 0.5606813430786133
Validation loss = 0.5566320419311523
Validation loss = 0.5653843283653259
Validation loss = 0.5640288591384888
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5568177103996277
Validation loss = 0.5641252994537354
Validation loss = 0.5628130435943604
Validation loss = 0.5690703988075256
Validation loss = 0.565412700176239
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5637357831001282
Validation loss = 0.5615735054016113
Validation loss = 0.5629914402961731
Validation loss = 0.5683979392051697
Validation loss = 0.5667203068733215
Validation loss = 0.57074373960495
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.9875666074600356
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9866903283052352
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.9911347517730497
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.9973427812223207
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0008841732979663
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0017667844522968
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.0061782877316858
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.007936507936508
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0088105726872247
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.011443661971831
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.019349164467898
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.0246045694200352
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.0280948200175593
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0289473684210526
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.0341805433829974
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.0402802101576183
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0402449693788276
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0419580419580419
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0427947598253275
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.043630017452007
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.046207497820401
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0452961672473868
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.050478677110531
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0530434782608695
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -22.6    |
| Iteration     | 44       |
| MaximumReturn | -0.178   |
| MinimumReturn | -83.4    |
| TotalSamples  | 76636    |
----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5571206212043762
Validation loss = 0.5596330761909485
Validation loss = 0.5651838779449463
Validation loss = 0.5604305267333984
Validation loss = 0.5658923983573914
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5579248666763306
Validation loss = 0.5594967603683472
Validation loss = 0.5636438727378845
Validation loss = 0.5613099932670593
Validation loss = 0.5643557906150818
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5649816393852234
Validation loss = 0.5596717596054077
Validation loss = 0.561249852180481
Validation loss = 0.5650105476379395
Validation loss = 0.5716922283172607
Validation loss = 0.5655926465988159
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5615649819374084
Validation loss = 0.5631589293479919
Validation loss = 0.5589509606361389
Validation loss = 0.5683739185333252
Validation loss = 0.5706731677055359
Validation loss = 0.5676185488700867
Validation loss = 0.5705257058143616
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.569892406463623
Validation loss = 0.5631814002990723
Validation loss = 0.569355845451355
Validation loss = 0.572568416595459
Validation loss = 0.5733610391616821
Validation loss = 0.5697309374809265
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.053866203301477
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0555555555555556
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0581092801387684
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.0649913344887347
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0666666666666667
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.070069204152249
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0726015557476232
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.0785837651122625
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0811044003451251
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.081896551724138
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0809646856158484
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.0903614457831325
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0911435941530525
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.0970790378006874
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0987124463519313
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.102058319039451
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.101113967437875
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1044520547945205
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.1086398631308811
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.1145299145299146
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.116994022203245
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.120307167235495
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1193520886615516
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.122657580919932
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.127659574468085
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.691   |
| Iteration     | 45       |
| MaximumReturn | -0.168   |
| MinimumReturn | -2.36    |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5609328150749207
Validation loss = 0.5630610585212708
Validation loss = 0.5644022226333618
Validation loss = 0.5638538599014282
Validation loss = 0.5667359828948975
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5654148459434509
Validation loss = 0.5667354464530945
Validation loss = 0.5690225958824158
Validation loss = 0.5659670829772949
Validation loss = 0.5650696158409119
Validation loss = 0.569889485836029
Validation loss = 0.568282425403595
Validation loss = 0.5740442276000977
Validation loss = 0.5704429745674133
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5677137970924377
Validation loss = 0.569482147693634
Validation loss = 0.5640472173690796
Validation loss = 0.56944739818573
Validation loss = 0.5692871809005737
Validation loss = 0.5732911825180054
Validation loss = 0.572803795337677
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5728422999382019
Validation loss = 0.5725976228713989
Validation loss = 0.5652369260787964
Validation loss = 0.571296215057373
Validation loss = 0.5712553858757019
Validation loss = 0.5708169341087341
Validation loss = 0.5755735039710999
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5677642226219177
Validation loss = 0.5681475400924683
Validation loss = 0.5704572796821594
Validation loss = 0.5673547387123108
Validation loss = 0.5775390863418579
Validation loss = 0.5717340111732483
Validation loss = 0.573603093624115
Validation loss = 0.5767104029655457
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.130952380952381
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1342395921835173
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.134974533106961
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1340118744698897
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.1364406779661016
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1380186282811178
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.140439932318105
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.1479289940828403
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.1554054054054055
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.158649789029536
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1585160202360878
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1592249368155012
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1624579124579124
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1631623212783853
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 13
average number of affinization = 1.173109243697479
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.1754827875734677
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.1820469798657718
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.1877619446772842
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.1917922948073703
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1907949790794978
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1923076923076923
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1954887218045114
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1969949916527547
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1976647206005004
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.2016666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.526   |
| Iteration     | 46       |
| MaximumReturn | -0.116   |
| MinimumReturn | -1.46    |
| TotalSamples  | 79968    |
----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5659424066543579
Validation loss = 0.5607401132583618
Validation loss = 0.5638186931610107
Validation loss = 0.5679726004600525
Validation loss = 0.5648530125617981
Validation loss = 0.5664741396903992
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5665431022644043
Validation loss = 0.566672146320343
Validation loss = 0.5736919641494751
Validation loss = 0.5696252584457397
Validation loss = 0.5770753622055054
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5765519142150879
Validation loss = 0.56316077709198
Validation loss = 0.5710505247116089
Validation loss = 0.5689037442207336
Validation loss = 0.5738704800605774
Validation loss = 0.5713152885437012
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5771826505661011
Validation loss = 0.5692182779312134
Validation loss = 0.5691009759902954
Validation loss = 0.5718790888786316
Validation loss = 0.5720083713531494
Validation loss = 0.5742680430412292
Validation loss = 0.5747717618942261
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5775490999221802
Validation loss = 0.5734299421310425
Validation loss = 0.5710684657096863
Validation loss = 0.577666163444519
Validation loss = 0.578918993473053
Validation loss = 0.5824956893920898
Validation loss = 0.5773481726646423
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.2031640299750208
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.204658901830283
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.2086450540315876
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.2117940199335548
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.213278008298755
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.2197346600331676
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.2220381110190555
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.2259933774834437
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.227460711331679
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.2314049586776858
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.2378199834847234
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.2392739273927393
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.2407254740313274
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.2495881383855025
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.2518518518518518
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.254111842105263
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.2580115036976172
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.264367816091954
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.2674323215750616
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.272950819672131
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.2751842751842752
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.2798690671031097
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.2804578904333606
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.283496732026144
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.2889795918367346
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.825   |
| Iteration     | 47       |
| MaximumReturn | -0.154   |
| MinimumReturn | -10.3    |
| TotalSamples  | 81634    |
----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5660954713821411
Validation loss = 0.562393069267273
Validation loss = 0.5666024684906006
Validation loss = 0.5677163004875183
Validation loss = 0.5670600533485413
Validation loss = 0.5704672932624817
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.576363742351532
Validation loss = 0.5664706230163574
Validation loss = 0.5800093412399292
Validation loss = 0.5697063207626343
Validation loss = 0.5744229555130005
Validation loss = 0.5738736391067505
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5737991333007812
Validation loss = 0.567238450050354
Validation loss = 0.5748218894004822
Validation loss = 0.572381317615509
Validation loss = 0.5737717747688293
Validation loss = 0.5734285116195679
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5745517611503601
Validation loss = 0.5753977298736572
Validation loss = 0.5765389800071716
Validation loss = 0.5810220241546631
Validation loss = 0.5751211643218994
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5730907917022705
Validation loss = 0.5694221258163452
Validation loss = 0.5736192464828491
Validation loss = 0.5781786441802979
Validation loss = 0.5826519727706909
Validation loss = 0.5807520151138306
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.2944535073409462
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.2942135289323553
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 13
average number of affinization = 1.3037459283387622
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.3083807973962571
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.313821138211382
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 17
average number of affinization = 1.3265637692932575
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.327922077922078
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.3292781832927818
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.33387358184765
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.334412955465587
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.3357605177993528
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.3387227162489894
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.3441033925686592
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.3470540758676353
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 13
average number of affinization = 1.3564516129032258
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.3585817888799356
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.3607085346215781
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.3660498793242155
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.369774919614148
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.370281124497992
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.3739967897271268
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.3785084202085005
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.3798076923076923
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.3819055244195357
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.3832
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.17    |
| Iteration     | 48       |
| MaximumReturn | -0.0814  |
| MinimumReturn | -40.1    |
| TotalSamples  | 83300    |
----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5688786506652832
Validation loss = 0.5657630562782288
Validation loss = 0.5649765133857727
Validation loss = 0.5739911198616028
Validation loss = 0.5772407054901123
Validation loss = 0.5769011974334717
Validation loss = 0.575577437877655
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.578765332698822
Validation loss = 0.570899486541748
Validation loss = 0.5753673911094666
Validation loss = 0.5789716243743896
Validation loss = 0.578865647315979
Validation loss = 0.5751994252204895
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5743332505226135
Validation loss = 0.5750018358230591
Validation loss = 0.5805756449699402
Validation loss = 0.5747623443603516
Validation loss = 0.5797471404075623
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5770490169525146
Validation loss = 0.5784478783607483
Validation loss = 0.5810247659683228
Validation loss = 0.5830429792404175
Validation loss = 0.5786148905754089
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5860190987586975
Validation loss = 0.576738178730011
Validation loss = 0.5800772905349731
Validation loss = 0.5811071395874023
Validation loss = 0.5802033543586731
Validation loss = 0.5826981663703918
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.386890487609912
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.3905750798722045
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.3950518754988028
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.3971291866028708
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.4055776892430278
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.4060509554140128
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.4073190135242641
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.4093799682034975
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.4162033359809372
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.4206349206349207
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.4258524980174465
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.4270998415213947
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 15
average number of affinization = 1.43784639746635
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.4406645569620253
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.4411067193675888
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.4447077409162716
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.4483030781373323
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.4542586750788644
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.458628841607565
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.4598425196850393
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.4673485444531864
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.4724842767295598
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.474469756480754
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 14
average number of affinization = 1.4843014128728413
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.4909803921568627
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -39.9    |
| Iteration     | 49       |
| MaximumReturn | -0.189   |
| MinimumReturn | -121     |
| TotalSamples  | 84966    |
----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5743550062179565
Validation loss = 0.5700319409370422
Validation loss = 0.5771873593330383
Validation loss = 0.5688517093658447
Validation loss = 0.5761922001838684
Validation loss = 0.5799819231033325
Validation loss = 0.5771862268447876
Validation loss = 0.57652747631073
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5796538591384888
Validation loss = 0.5741474032402039
Validation loss = 0.5755093693733215
Validation loss = 0.5763218402862549
Validation loss = 0.585738480091095
Validation loss = 0.5762420892715454
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5711912512779236
Validation loss = 0.5725878477096558
Validation loss = 0.5762907862663269
Validation loss = 0.5798911452293396
Validation loss = 0.5777331590652466
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5779112577438354
Validation loss = 0.5722828507423401
Validation loss = 0.5768865346908569
Validation loss = 0.5761714577674866
Validation loss = 0.5797069072723389
Validation loss = 0.5858869552612305
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.574055016040802
Validation loss = 0.5740060210227966
Validation loss = 0.5762604475021362
Validation loss = 0.5761095881462097
Validation loss = 0.5805259346961975
Validation loss = 0.5878653526306152
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.499216300940439
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.4988253719655442
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.5007824726134584
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.5035183737294762
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.50234375
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.505854800936768
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 13
average number of affinization = 1.5148205928237128
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.5214341387373345
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.5264797507788161
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.5307392996108948
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.536547433903577
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.5384615384615385
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.5419254658385093
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.5484871993793639
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.5527131782945736
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 13
average number of affinization = 1.5615801704105345
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.565015479876161
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.5730858468677493
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.5780525502318392
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.5822393822393823
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.5887345679012346
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.5936777178103316
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.6001540832049306
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.6004618937644342
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.603846153846154
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -64.1    |
| Iteration     | 50       |
| MaximumReturn | -0.918   |
| MinimumReturn | -118     |
| TotalSamples  | 86632    |
----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5727447867393494
Validation loss = 0.5786523818969727
Validation loss = 0.5718234777450562
Validation loss = 0.5758277773857117
Validation loss = 0.5801217555999756
Validation loss = 0.5840019583702087
Validation loss = 0.5810660123825073
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5747494101524353
Validation loss = 0.5713857412338257
Validation loss = 0.5753218531608582
Validation loss = 0.5788229703903198
Validation loss = 0.5813486576080322
Validation loss = 0.5812183022499084
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5760425329208374
Validation loss = 0.5742019414901733
Validation loss = 0.5754062533378601
Validation loss = 0.5742841958999634
Validation loss = 0.5737124681472778
Validation loss = 0.582120954990387
Validation loss = 0.5791537761688232
Validation loss = 0.58009934425354
Validation loss = 0.5853946805000305
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5728124380111694
Validation loss = 0.5761982798576355
Validation loss = 0.5792925357818604
Validation loss = 0.5782554745674133
Validation loss = 0.5830255150794983
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.583060622215271
Validation loss = 0.5754818916320801
Validation loss = 0.576430082321167
Validation loss = 0.5787956714630127
Validation loss = 0.5834707021713257
Validation loss = 0.5861527919769287
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.611837048424289
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.6175115207373272
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.6216423637759019
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.6257668711656441
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.6275862068965516
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.6271056661562022
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.6312165263963274
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.632262996941896
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.6348357524828112
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.6404580152671755
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.646834477498093
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.6532012195121952
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.6534653465346534
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.6537290715372908
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.6555133079847908
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.6610942249240122
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.6636294608959756
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.6714719271623673
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 15
average number of affinization = 1.681576952236543
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.6886363636363637
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.6934140802422408
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.7012102874432677
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.708994708994709
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.7129909365558913
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.7162264150943396
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -68.7    |
| Iteration     | 51       |
| MaximumReturn | -0.776   |
| MinimumReturn | -144     |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5761779546737671
Validation loss = 0.5706729292869568
Validation loss = 0.576327383518219
Validation loss = 0.5847852230072021
Validation loss = 0.5765402317047119
Validation loss = 0.5821117758750916
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5772905349731445
Validation loss = 0.5722246766090393
Validation loss = 0.5775039792060852
Validation loss = 0.5824004411697388
Validation loss = 0.5800610780715942
Validation loss = 0.5846515893936157
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5809250473976135
Validation loss = 0.5729511380195618
Validation loss = 0.5783584117889404
Validation loss = 0.5793900489807129
Validation loss = 0.5797640681266785
Validation loss = 0.5819680690765381
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5757620930671692
Validation loss = 0.5723302364349365
Validation loss = 0.5796701312065125
Validation loss = 0.5787071585655212
Validation loss = 0.5856430530548096
Validation loss = 0.5808525681495667
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5769935846328735
Validation loss = 0.5816147923469543
Validation loss = 0.5807007551193237
Validation loss = 0.5842971205711365
Validation loss = 0.584388792514801
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.7194570135746607
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.7249434815373021
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.7251506024096386
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.72686230248307
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.7293233082706767
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.734785875281743
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.7357357357357357
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.7396849212303076
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.7451274362818592
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.749812734082397
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.7559880239520957
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.7606581899775617
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 13
average number of affinization = 1.7690582959641257
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.7699775952203136
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.7746268656716417
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 14
average number of affinization = 1.7837434750186427
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.7868852459016393
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 14
average number of affinization = 1.7959791511541325
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.8013392857142858
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.8066914498141264
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.812035661218425
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.8151447661469933
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.817507418397626
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.8191252779836917
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.8214814814814815
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -71.2    |
| Iteration     | 52       |
| MaximumReturn | -0.733   |
| MinimumReturn | -135     |
| TotalSamples  | 89964    |
----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5773274898529053
Validation loss = 0.5799511671066284
Validation loss = 0.5797805190086365
Validation loss = 0.5791725516319275
Validation loss = 0.5767182111740112
Validation loss = 0.5807716846466064
Validation loss = 0.5867815613746643
Validation loss = 0.578738808631897
Validation loss = 0.5803835988044739
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5759570002555847
Validation loss = 0.5750674605369568
Validation loss = 0.5733467936515808
Validation loss = 0.5807135105133057
Validation loss = 0.5796800255775452
Validation loss = 0.5820313096046448
Validation loss = 0.5816422700881958
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5773548483848572
Validation loss = 0.5760713219642639
Validation loss = 0.5823121666908264
Validation loss = 0.5828931331634521
Validation loss = 0.5770864486694336
Validation loss = 0.5795755386352539
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5691599249839783
Validation loss = 0.5737447738647461
Validation loss = 0.5750294923782349
Validation loss = 0.5764536261558533
Validation loss = 0.5799028277397156
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5777717232704163
Validation loss = 0.5726826190948486
Validation loss = 0.5790522694587708
Validation loss = 0.5836449265480042
Validation loss = 0.5809752941131592
Validation loss = 0.5777215957641602
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.8208734270910436
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.8224852071005917
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.8300073909830008
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.829394387001477
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 17
average number of affinization = 1.840590405904059
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.8466076696165192
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.8467207074428886
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.8527245949926363
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 13
average number of affinization = 1.8609271523178808
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 18
average number of affinization = 1.8727941176470588
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.8772961058045554
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.8839941262848752
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.8848129126925899
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.8841642228739004
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.8886446886446886
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.890922401171303
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.8975859546452085
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.9042397660818713
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.9057706355003652
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.9131386861313868
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.9161196207148068
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.9190962099125364
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.9235251274581209
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.930858806404658
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.9352727272727273
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -60      |
| Iteration     | 53       |
| MaximumReturn | -0.343   |
| MinimumReturn | -120     |
| TotalSamples  | 91630    |
----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5773700475692749
Validation loss = 0.5748293399810791
Validation loss = 0.5761756896972656
Validation loss = 0.5788903832435608
Validation loss = 0.5869499444961548
Validation loss = 0.5829342603683472
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5759087204933167
Validation loss = 0.5798789262771606
Validation loss = 0.5773528814315796
Validation loss = 0.582956075668335
Validation loss = 0.5804494023323059
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5773090124130249
Validation loss = 0.5746028423309326
Validation loss = 0.5741603374481201
Validation loss = 0.5799015760421753
Validation loss = 0.585952877998352
Validation loss = 0.5805923938751221
Validation loss = 0.5835264325141907
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5778877139091492
Validation loss = 0.5752806067466736
Validation loss = 0.5756229162216187
Validation loss = 0.5742825865745544
Validation loss = 0.5761305689811707
Validation loss = 0.5826505422592163
Validation loss = 0.5799235701560974
Validation loss = 0.5809967517852783
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5761435031890869
Validation loss = 0.5747747421264648
Validation loss = 0.5744631886482239
Validation loss = 0.5798827409744263
Validation loss = 0.579740583896637
Validation loss = 0.5839176774024963
Validation loss = 0.5870551466941833
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.941860465116279
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.9448075526506898
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.9521044992743106
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.9557650471356056
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 15
average number of affinization = 1.9652173913043478
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.9688631426502534
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.9710564399421129
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.9768618944323932
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.9797687861271676
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.9819494584837545
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.9870129870129871
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.9899062725306416
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.9956772334293948
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.000719942404608
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.0028776978417264
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.0086268871315602
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.014367816091954
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.020100502512563
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.0200860832137733
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 16
average number of affinization = 2.0301075268817206
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.0386819484240686
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 15
average number of affinization = 2.0479599141016465
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.048640915593705
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.0528949249463904
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.0557142857142856
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -71.5    |
| Iteration     | 54       |
| MaximumReturn | -0.472   |
| MinimumReturn | -130     |
| TotalSamples  | 93296    |
----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5738840699195862
Validation loss = 0.5833072662353516
Validation loss = 0.5787334442138672
Validation loss = 0.574755847454071
Validation loss = 0.5776431560516357
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5770084261894226
Validation loss = 0.5783756375312805
Validation loss = 0.5716413259506226
Validation loss = 0.577842116355896
Validation loss = 0.582818865776062
Validation loss = 0.5780957937240601
Validation loss = 0.574787437915802
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5784528851509094
Validation loss = 0.577061653137207
Validation loss = 0.5777726173400879
Validation loss = 0.5757960677146912
Validation loss = 0.5766260623931885
Validation loss = 0.5775100588798523
Validation loss = 0.579140305519104
Validation loss = 0.5799447298049927
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5790466070175171
Validation loss = 0.5742071270942688
Validation loss = 0.5803213715553284
Validation loss = 0.5768643021583557
Validation loss = 0.5732858180999756
Validation loss = 0.5795937180519104
Validation loss = 0.5803483724594116
Validation loss = 0.5896986722946167
Validation loss = 0.5814347267150879
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5806646943092346
Validation loss = 0.5770068764686584
Validation loss = 0.5774831771850586
Validation loss = 0.5801554322242737
Validation loss = 0.582850992679596
Validation loss = 0.5795512199401855
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.062098501070664
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.067047075606277
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.069137562366358
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.0776353276353277
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.079715302491103
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.085348506401138
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 17
average number of affinization = 2.0959488272921107
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.0994318181818183
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.107168204400284
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.1141843971631205
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.121190644932672
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.126770538243626
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.1309271054493983
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.1386138613861387
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.1462897526501767
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.1518361581920904
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.15878616796048
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 19
average number of affinization = 2.170662905500705
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 17
average number of affinization = 2.181113460183228
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 15
average number of affinization = 2.1901408450704225
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.1942294159042928
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.1954992967651195
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.197470133520731
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.2036516853932584
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.208421052631579
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -47.4    |
| Iteration     | 55       |
| MaximumReturn | -0.43    |
| MinimumReturn | -116     |
| TotalSamples  | 94962    |
----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5726529359817505
Validation loss = 0.5745536684989929
Validation loss = 0.5719083547592163
Validation loss = 0.5733714699745178
Validation loss = 0.5761173963546753
Validation loss = 0.5727783441543579
Validation loss = 0.5759680867195129
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5765135884284973
Validation loss = 0.5742947459220886
Validation loss = 0.569792628288269
Validation loss = 0.5764213800430298
Validation loss = 0.5757488012313843
Validation loss = 0.5776789784431458
Validation loss = 0.5802196860313416
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5793265700340271
Validation loss = 0.5751407742500305
Validation loss = 0.5748684406280518
Validation loss = 0.5841004848480225
Validation loss = 0.5851303339004517
Validation loss = 0.5828931331634521
Validation loss = 0.5819491147994995
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5825937390327454
Validation loss = 0.5789522528648376
Validation loss = 0.5786522626876831
Validation loss = 0.5784829258918762
Validation loss = 0.5830557942390442
Validation loss = 0.5767238140106201
Validation loss = 0.5802112221717834
Validation loss = 0.5830344557762146
Validation loss = 0.5822585225105286
Validation loss = 0.587779700756073
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5755385756492615
Validation loss = 0.5744819045066833
Validation loss = 0.5790263414382935
Validation loss = 0.5844504833221436
Validation loss = 0.5831443071365356
Validation loss = 0.5821900367736816
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.2110799438990183
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 17
average number of affinization = 2.2214435879467413
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.222689075630252
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.2295311406578024
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.234265734265734
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.236198462613557
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.244413407821229
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.2512212142358687
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.2594142259414225
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.2662020905923344
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.267409470752089
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.269311064718163
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.2719054242002783
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.276580958999305
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.279861111111111
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.283136710617627
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.2857142857142856
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.2875952875952876
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 17
average number of affinization = 2.2977839335180055
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.298961937716263
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.3022130013831257
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.305459571527298
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.3107734806629834
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.3119392684610074
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.3151724137931033
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -75      |
| Iteration     | 56       |
| MaximumReturn | -0.325   |
| MinimumReturn | -138     |
| TotalSamples  | 96628    |
----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.573415219783783
Validation loss = 0.5743277668952942
Validation loss = 0.571494996547699
Validation loss = 0.5762514472007751
Validation loss = 0.5757442116737366
Validation loss = 0.5786534547805786
Validation loss = 0.5724005699157715
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.57723069190979
Validation loss = 0.5786619186401367
Validation loss = 0.5752372145652771
Validation loss = 0.5793547630310059
Validation loss = 0.5745832324028015
Validation loss = 0.5760225653648376
Validation loss = 0.5851624608039856
Validation loss = 0.5804614424705505
Validation loss = 0.5785330533981323
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5792723298072815
Validation loss = 0.5774819254875183
Validation loss = 0.5800588130950928
Validation loss = 0.5781913995742798
Validation loss = 0.5797755122184753
Validation loss = 0.5807197690010071
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.579163134098053
Validation loss = 0.5746899247169495
Validation loss = 0.5809234380722046
Validation loss = 0.5800873637199402
Validation loss = 0.5817089676856995
Validation loss = 0.5795375108718872
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.582104504108429
Validation loss = 0.5703076124191284
Validation loss = 0.5793195366859436
Validation loss = 0.5788375735282898
Validation loss = 0.5768124461174011
Validation loss = 0.5836783051490784
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.315644383184011
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.3209366391184574
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.3269098417068137
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 17
average number of affinization = 2.3370013755158183
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.3381443298969073
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.3440934065934065
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.350720658888126
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.353223593964335
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.3564084989718985
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.3636986301369864
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.3668720054757015
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.374829001367989
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 16
average number of affinization = 2.384142173615858
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.390710382513661
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 20
average number of affinization = 2.4027303754266214
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.4106412005457027
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.41785957736878
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.4209809264305178
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.424778761061947
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.4292517006802723
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 20
average number of affinization = 2.441196464989803
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.447690217391304
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 15
average number of affinization = 2.4562118126272914
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.4640434192672998
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.4705084745762713
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -74      |
| Iteration     | 57       |
| MaximumReturn | -0.677   |
| MinimumReturn | -149     |
| TotalSamples  | 98294    |
----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5711653828620911
Validation loss = 0.5686599612236023
Validation loss = 0.5708026885986328
Validation loss = 0.5726290345191956
Validation loss = 0.5749874711036682
Validation loss = 0.5775842070579529
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5745018720626831
Validation loss = 0.5791289210319519
Validation loss = 0.5762233138084412
Validation loss = 0.5760594606399536
Validation loss = 0.5776783227920532
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5716092586517334
Validation loss = 0.5724433064460754
Validation loss = 0.5729636549949646
Validation loss = 0.581062376499176
Validation loss = 0.5798670649528503
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5791103839874268
Validation loss = 0.579737663269043
Validation loss = 0.5782142281532288
Validation loss = 0.5789264440536499
Validation loss = 0.582314670085907
Validation loss = 0.580562174320221
Validation loss = 0.5815253257751465
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5719001293182373
Validation loss = 0.5769655704498291
Validation loss = 0.5736966133117676
Validation loss = 0.5732924342155457
Validation loss = 0.5746575593948364
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.4722222222222223
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.4746106973595126
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.482408660351827
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 16
average number of affinization = 2.491548343475321
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.4966216216216215
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.50033760972316
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.506072874493927
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.5064059339177343
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.5101078167115904
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.5117845117845117
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.5148048452220726
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.5198386012104907
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.524193548387097
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 20
average number of affinization = 2.535930154466085
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.5395973154362417
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.543259557344064
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.5482573726541555
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.5559276624246485
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.5602409638554215
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.5638795986622074
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.5681818181818183
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.5684702738810956
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.5740987983978636
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.5750500333555704
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.5773333333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -91.2    |
| Iteration     | 58       |
| MaximumReturn | -0.344   |
| MinimumReturn | -136     |
| TotalSamples  | 99960    |
----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5669951438903809
Validation loss = 0.5729405879974365
Validation loss = 0.5730125308036804
Validation loss = 0.5787307620048523
Validation loss = 0.5756773352622986
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5720258951187134
Validation loss = 0.5701248049736023
Validation loss = 0.5743539333343506
Validation loss = 0.5742706060409546
Validation loss = 0.5838175415992737
Validation loss = 0.5787954926490784
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5696721076965332
Validation loss = 0.5715842247009277
Validation loss = 0.5749614834785461
Validation loss = 0.5735297203063965
Validation loss = 0.5755143165588379
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5770531296730042
Validation loss = 0.5747275352478027
Validation loss = 0.5784872174263
Validation loss = 0.5785499811172485
Validation loss = 0.5765395164489746
Validation loss = 0.5807995796203613
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5738687515258789
Validation loss = 0.5685785412788391
Validation loss = 0.5722882747650146
Validation loss = 0.5745002627372742
Validation loss = 0.5716150403022766
Validation loss = 0.5698040127754211
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 20
average number of affinization = 2.5889407061958694
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 23
average number of affinization = 2.6025299600532623
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.6027944111776447
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.6070478723404253
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 16
average number of affinization = 2.6159468438538207
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 17
average number of affinization = 2.6254980079681274
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 18
average number of affinization = 2.6357000663570007
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.6412466843501328
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.642809807819748
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.6456953642384105
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 23
average number of affinization = 2.659166115155526
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.664021164021164
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.6655651024454725
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.669749009247028
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 28
average number of affinization = 2.6864686468646863
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 18
average number of affinization = 2.6965699208443272
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 20
average number of affinization = 2.707976268951879
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 19
average number of affinization = 2.7187088274044795
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 16
average number of affinization = 2.727452271231073
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.7282894736842107
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.7337278106508878
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 17
average number of affinization = 2.743101182654402
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 27
average number of affinization = 2.7590282337491794
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.763779527559055
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.7711475409836064
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -49.2    |
| Iteration     | 59       |
| MaximumReturn | -0.423   |
| MinimumReturn | -222     |
| TotalSamples  | 101626   |
----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5695972442626953
Validation loss = 0.5697066783905029
Validation loss = 0.5703083276748657
Validation loss = 0.565775990486145
Validation loss = 0.5706918239593506
Validation loss = 0.5693535208702087
Validation loss = 0.5728470683097839
Validation loss = 0.5709589123725891
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5719447135925293
Validation loss = 0.5661278367042542
Validation loss = 0.5700655579566956
Validation loss = 0.5798095464706421
Validation loss = 0.571418046951294
Validation loss = 0.5720798969268799
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5674727559089661
Validation loss = 0.573275625705719
Validation loss = 0.5720036029815674
Validation loss = 0.5727108120918274
Validation loss = 0.5718463063240051
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.574175238609314
Validation loss = 0.5757135152816772
Validation loss = 0.5742480158805847
Validation loss = 0.5791687369346619
Validation loss = 0.5770107507705688
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5703368186950684
Validation loss = 0.5671910643577576
Validation loss = 0.5690913200378418
Validation loss = 0.5715040564537048
Validation loss = 0.572847306728363
Validation loss = 0.574124813079834
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 18
average number of affinization = 2.781127129750983
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.787819253438114
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.7905759162303663
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.7965990843688684
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.8006535947712417
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.807968647942521
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 20
average number of affinization = 2.819190600522193
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.823874755381605
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.8265971316818774
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.828013029315961
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.83203125
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.8392973324658426
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 17
average number of affinization = 2.8485045513654095
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 19
average number of affinization = 2.8589993502274202
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.8655844155844155
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 17
average number of affinization = 2.874756651524984
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.878728923476005
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.880751782242385
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.881476683937824
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.888673139158576
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 24
average number of affinization = 2.9023285899094438
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.9069166127989656
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.9127906976744184
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.91994835377663
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.9251612903225808
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -78      |
| Iteration     | 60       |
| MaximumReturn | -2.34    |
| MinimumReturn | -139     |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5704326629638672
Validation loss = 0.5667678713798523
Validation loss = 0.5722033381462097
Validation loss = 0.5675807595252991
Validation loss = 0.573025107383728
Validation loss = 0.5711148977279663
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5691578984260559
Validation loss = 0.5704474449157715
Validation loss = 0.5693176984786987
Validation loss = 0.5731914639472961
Validation loss = 0.5736623406410217
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5687000155448914
Validation loss = 0.5707345604896545
Validation loss = 0.57231205701828
Validation loss = 0.5760560631752014
Validation loss = 0.5747123956680298
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5702601671218872
Validation loss = 0.5721933841705322
Validation loss = 0.5760151743888855
Validation loss = 0.5743627548217773
Validation loss = 0.5779688954353333
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5690225958824158
Validation loss = 0.5681086182594299
Validation loss = 0.5704528093338013
Validation loss = 0.5737767219543457
Validation loss = 0.5694653987884521
Validation loss = 0.5698413252830505
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 17
average number of affinization = 2.9342359767891684
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.9387886597938144
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.93882807469414
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.94015444015444
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.9434083601286174
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.947943444730077
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.9550417469492616
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.960847240051348
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.9666452854393843
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.971794871794872
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 15
average number of affinization = 2.979500320307495
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.9827144686299616
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.9884836852207295
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.9948849104859336
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.997444089456869
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.998722860791826
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.00382897255903
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.00765306122449
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 5
average number of affinization = 3.008922880815806
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.018471337579618
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.021005728835137
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 4
average number of affinization = 3.0216284987277353
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.0273363000635727
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 3.027318932655654
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.0317460317460316
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -104     |
| Iteration     | 61       |
| MaximumReturn | -2.89    |
| MinimumReturn | -170     |
| TotalSamples  | 104958   |
----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5669611096382141
Validation loss = 0.5744791030883789
Validation loss = 0.5682383179664612
Validation loss = 0.5650479197502136
Validation loss = 0.5711538791656494
Validation loss = 0.5696216821670532
Validation loss = 0.5711631178855896
Validation loss = 0.5720424652099609
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5679263472557068
Validation loss = 0.5670201778411865
Validation loss = 0.5732582211494446
Validation loss = 0.5696486830711365
Validation loss = 0.5678234696388245
Validation loss = 0.5674372911453247
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5725206136703491
Validation loss = 0.5677136778831482
Validation loss = 0.5697970390319824
Validation loss = 0.5707208514213562
Validation loss = 0.5737045407295227
Validation loss = 0.5732678174972534
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5729740262031555
Validation loss = 0.5688751935958862
Validation loss = 0.5718239545822144
Validation loss = 0.5704604387283325
Validation loss = 0.5745123624801636
Validation loss = 0.5749157667160034
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5668173432350159
Validation loss = 0.565937876701355
Validation loss = 0.5677033066749573
Validation loss = 0.5671569108963013
Validation loss = 0.5694777965545654
Validation loss = 0.5716819167137146
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.0348984771573604
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.041851616994293
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.0481622306717364
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.053831538948702
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.061392405063291
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.0651486401012016
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.070164348925411
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.07706885660139
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.082070707070707
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.091482649842271
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.0933165195460277
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.099558916194077
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.1076826196473553
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.1139081183134047
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 23
average number of affinization = 3.1264150943396225
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.1338780641106223
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.1381909547738696
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.1462649089767734
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.148055207026349
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.155485893416928
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.158521303258145
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 20
average number of affinization = 3.169067000626174
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 20
average number of affinization = 3.1795994993742176
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.1844903064415258
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.189375
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -40.6    |
| Iteration     | 62       |
| MaximumReturn | -0.341   |
| MinimumReturn | -133     |
| TotalSamples  | 106624   |
----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.568488597869873
Validation loss = 0.5654460787773132
Validation loss = 0.5705592036247253
Validation loss = 0.5690715312957764
Validation loss = 0.5700390338897705
Validation loss = 0.5706901550292969
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5654576420783997
Validation loss = 0.5667564868927002
Validation loss = 0.5689681172370911
Validation loss = 0.5703762173652649
Validation loss = 0.5624820590019226
Validation loss = 0.5710665583610535
Validation loss = 0.5709413290023804
Validation loss = 0.56979900598526
Validation loss = 0.5726103186607361
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5685319900512695
Validation loss = 0.5590533018112183
Validation loss = 0.5679458379745483
Validation loss = 0.5668035745620728
Validation loss = 0.5669876933097839
Validation loss = 0.5780960321426392
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5698229670524597
Validation loss = 0.5670020580291748
Validation loss = 0.5709517598152161
Validation loss = 0.5715602040290833
Validation loss = 0.5690338611602783
Validation loss = 0.5740278959274292
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5664232969284058
Validation loss = 0.5638015866279602
Validation loss = 0.5660837888717651
Validation loss = 0.567241907119751
Validation loss = 0.5654186606407166
Validation loss = 0.5657801628112793
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.1948782011242973
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 19
average number of affinization = 3.2047440699126093
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.2102308172177167
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.2144638403990027
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 30
average number of affinization = 3.231152647975078
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 22
average number of affinization = 3.2428393524283936
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 21
average number of affinization = 3.2538892345986308
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.2574626865671643
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 19
average number of affinization = 3.267246737103791
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.2763975155279503
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 24
average number of affinization = 3.2892613283674734
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 19
average number of affinization = 3.2990074441687347
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.3075015499070055
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.309169764560099
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 20
average number of affinization = 3.319504643962848
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.323638613861386
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 21
average number of affinization = 3.334570191713049
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 20
average number of affinization = 3.34487021013597
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 19
average number of affinization = 3.3545398394070416
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.362962962962963
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 22
average number of affinization = 3.3744602097470695
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 21
average number of affinization = 3.3853267570900125
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.3930991990141712
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.397167487684729
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 29
average number of affinization = 3.412923076923077
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -28.6    |
| Iteration     | 63       |
| MaximumReturn | -0.352   |
| MinimumReturn | -98.3    |
| TotalSamples  | 108290   |
----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5643595457077026
Validation loss = 0.5626557469367981
Validation loss = 0.5717224478721619
Validation loss = 0.5705633163452148
Validation loss = 0.5711696147918701
Validation loss = 0.5686578154563904
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5664448738098145
Validation loss = 0.5689231157302856
Validation loss = 0.569699764251709
Validation loss = 0.5686874985694885
Validation loss = 0.5688577890396118
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5712215900421143
Validation loss = 0.5655040144920349
Validation loss = 0.5752884745597839
Validation loss = 0.56731116771698
Validation loss = 0.5693039894104004
Validation loss = 0.5725201368331909
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5713512897491455
Validation loss = 0.5695710182189941
Validation loss = 0.56625896692276
Validation loss = 0.5755845904350281
Validation loss = 0.5735226273536682
Validation loss = 0.571054220199585
Validation loss = 0.5742696523666382
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5690270662307739
Validation loss = 0.5639603137969971
Validation loss = 0.5659925937652588
Validation loss = 0.5682030916213989
Validation loss = 0.5677089691162109
Validation loss = 0.5750396847724915
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.4194341943419433
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.421634910878918
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.4293611793611793
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.4340085942295886
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.439877300613497
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.446965052115267
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 20
average number of affinization = 3.457107843137255
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.4629516227801593
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.4718482252141984
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.477064220183486
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.4816625916870416
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.489309712889432
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.4932844932844933
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.49908480780964
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.5060975609756095
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.514320536258379
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.520706455542022
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.5258673158855753
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 25
average number of affinization = 3.5389294403892944
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.5404255319148934
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.549210206561361
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 3.5500910746812386
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.5527912621359223
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 20
average number of affinization = 3.5627653123104914
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.5703030303030303
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11.8    |
| Iteration     | 64       |
| MaximumReturn | -0.258   |
| MinimumReturn | -194     |
| TotalSamples  | 109956   |
----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5671064853668213
Validation loss = 0.5668991208076477
Validation loss = 0.5726329684257507
Validation loss = 0.5684781670570374
Validation loss = 0.5697752237319946
Validation loss = 0.5706533193588257
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5683872699737549
Validation loss = 0.5734571218490601
Validation loss = 0.5681384205818176
Validation loss = 0.5675129890441895
Validation loss = 0.5664128661155701
Validation loss = 0.5689862370491028
Validation loss = 0.5720037221908569
Validation loss = 0.5712339282035828
Validation loss = 0.5736634135246277
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5700313448905945
Validation loss = 0.5660949945449829
Validation loss = 0.5716127157211304
Validation loss = 0.5676890015602112
Validation loss = 0.5680867433547974
Validation loss = 0.5705069303512573
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5729674696922302
Validation loss = 0.5665839910507202
Validation loss = 0.5719073414802551
Validation loss = 0.5702229738235474
Validation loss = 0.5718982219696045
Validation loss = 0.5795459151268005
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5657662749290466
Validation loss = 0.5675912499427795
Validation loss = 0.5692691206932068
Validation loss = 0.5689595341682434
Validation loss = 0.5676742792129517
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.5778316172016957
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.584140435835351
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.5898366606170597
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.593712212817412
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.6006042296072507
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.6068840579710146
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.6137598068799033
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 19
average number of affinization = 3.623039806996381
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 5
average number of affinization = 3.623869801084991
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.6295180722891565
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.6369656833232993
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.6419975932611313
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.6476247745039085
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 20
average number of affinization = 3.657451923076923
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 20
average number of affinization = 3.6672672672672673
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.675870348139256
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 19
average number of affinization = 3.6850629874025196
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 20
average number of affinization = 3.6948441247002397
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.6974236069502697
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.705988023952096
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.710951526032316
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 3.7117224880382773
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 5
average number of affinization = 3.71249252839211
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.717443249701314
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.72
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -10      |
| Iteration     | 65       |
| MaximumReturn | -0.317   |
| MinimumReturn | -171     |
| TotalSamples  | 111622   |
----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.568118155002594
Validation loss = 0.5667316317558289
Validation loss = 0.5672033429145813
Validation loss = 0.5670583844184875
Validation loss = 0.5694071650505066
Validation loss = 0.5681216716766357
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5739761590957642
Validation loss = 0.5694557428359985
Validation loss = 0.5723445415496826
Validation loss = 0.5701366066932678
Validation loss = 0.5722081065177917
Validation loss = 0.5700544714927673
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5666704773902893
Validation loss = 0.5682323575019836
Validation loss = 0.566356360912323
Validation loss = 0.565610945224762
Validation loss = 0.5692610740661621
Validation loss = 0.5709485411643982
Validation loss = 0.5686529874801636
Validation loss = 0.568956196308136
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5756580233573914
Validation loss = 0.569198489189148
Validation loss = 0.5695192813873291
Validation loss = 0.5699627995491028
Validation loss = 0.5712476372718811
Validation loss = 0.574448823928833
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5635021924972534
Validation loss = 0.5655641555786133
Validation loss = 0.5641751885414124
Validation loss = 0.5667735934257507
Validation loss = 0.5653589367866516
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.726133651551313
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.731663685152057
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.738379022646007
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 24
average number of affinization = 3.750446694460989
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 24
average number of affinization = 3.7625
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.7697798929208806
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 3.7681331747919145
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.774212715389186
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.7826603325415675
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.7875370919881304
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.792408066429419
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 26
average number of affinization = 3.805572021339656
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.8127962085308056
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 27
average number of affinization = 3.8265245707519244
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.828402366863905
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.8332347723240687
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.8368794326241136
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.8434731246308327
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.8506493506493507
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.8584070796460175
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 26
average number of affinization = 3.8714622641509435
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.8797878609310548
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.8857479387514724
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.891112419070041
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.8976470588235292
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -8.77    |
| Iteration     | 66       |
| MaximumReturn | -0.264   |
| MinimumReturn | -142     |
| TotalSamples  | 113288   |
----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5695729851722717
Validation loss = 0.5683885216712952
Validation loss = 0.5681382417678833
Validation loss = 0.5680143237113953
Validation loss = 0.5697957873344421
Validation loss = 0.5692320466041565
Validation loss = 0.5689629912376404
Validation loss = 0.573506772518158
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5704644322395325
Validation loss = 0.5679442286491394
Validation loss = 0.5736152529716492
Validation loss = 0.5683122277259827
Validation loss = 0.5717855095863342
Validation loss = 0.5725463032722473
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5709212422370911
Validation loss = 0.5663344264030457
Validation loss = 0.5693688988685608
Validation loss = 0.5700066089630127
Validation loss = 0.5692654848098755
Validation loss = 0.5682169198989868
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5695478320121765
Validation loss = 0.5668777823448181
Validation loss = 0.5684334635734558
Validation loss = 0.5686867833137512
Validation loss = 0.5718212127685547
Validation loss = 0.5740213394165039
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5658788084983826
Validation loss = 0.5616320967674255
Validation loss = 0.5647305846214294
Validation loss = 0.5671306848526001
Validation loss = 0.565537691116333
Validation loss = 0.5642030835151672
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.905349794238683
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.9136310223266744
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.9189665296535527
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.925469483568075
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 22
average number of affinization = 3.9360703812316715
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.943141852286049
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.9466900995899237
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.950234192037471
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.9578700994733764
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.9590643274853803
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.9614260666277032
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.967289719626168
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.9725627553998835
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.9760793465577597
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.9830903790087464
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.9900932400932403
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.991846243447874
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.9976717112922002
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.003490401396161
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.008139534883721
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 4.006391632771645
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.012195121951219
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 23
average number of affinization = 4.023215322112594
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 4.02262180974478
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.028405797101449
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -21.5    |
| Iteration     | 67       |
| MaximumReturn | -0.255   |
| MinimumReturn | -162     |
| TotalSamples  | 114954   |
----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5685622692108154
Validation loss = 0.5645103454589844
Validation loss = 0.5720847249031067
Validation loss = 0.5697237849235535
Validation loss = 0.5713775753974915
Validation loss = 0.5698623657226562
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.573268473148346
Validation loss = 0.5689427852630615
Validation loss = 0.5705053210258484
Validation loss = 0.574942946434021
Validation loss = 0.573638379573822
Validation loss = 0.5676748752593994
Validation loss = 0.5724658370018005
Validation loss = 0.5755942463874817
Validation loss = 0.5797674655914307
Validation loss = 0.5707158446311951
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.568371593952179
Validation loss = 0.5710238814353943
Validation loss = 0.5736387372016907
Validation loss = 0.5685726404190063
Validation loss = 0.5680115222930908
Validation loss = 0.5720584988594055
Validation loss = 0.5718674659729004
Validation loss = 0.577192485332489
Validation loss = 0.5745729207992554
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5789502859115601
Validation loss = 0.5708755850791931
Validation loss = 0.5711510181427002
Validation loss = 0.5705442428588867
Validation loss = 0.5698984861373901
Validation loss = 0.5722627639770508
Validation loss = 0.5744019150733948
Validation loss = 0.5783998966217041
Validation loss = 0.5746062994003296
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5700490474700928
Validation loss = 0.5638769865036011
Validation loss = 0.5706301927566528
Validation loss = 0.5674299001693726
Validation loss = 0.567459762096405
Validation loss = 0.5678289532661438
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.035341830822712
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.038795599305153
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.045717592592593
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.05320994794679
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 19
average number of affinization = 4.061849710982659
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.06528018486424
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.0704387990762125
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.0767455279861515
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.081891580161477
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 22
average number of affinization = 4.0922190201729105
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.096198156682028
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 21
average number of affinization = 4.105929763960852
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.110471806674338
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 34
average number of affinization = 4.127659574468085
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 25
average number of affinization = 4.139655172413793
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.145318782309018
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 18
average number of affinization = 4.153272101033295
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.156626506024097
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 8
average number of affinization = 4.1588302752293576
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.164469914040114
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.1678121420389465
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.174585002862049
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.1779176201373
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 25
average number of affinization = 4.189822755860492
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.192571428571428
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -30.9    |
| Iteration     | 68       |
| MaximumReturn | -0.25    |
| MinimumReturn | -101     |
| TotalSamples  | 116620   |
----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5676302313804626
Validation loss = 0.567182183265686
Validation loss = 0.568428635597229
Validation loss = 0.5707284808158875
Validation loss = 0.5688730478286743
Validation loss = 0.5725424289703369
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5746632218360901
Validation loss = 0.5710650682449341
Validation loss = 0.5717117190361023
Validation loss = 0.5738196969032288
Validation loss = 0.5719500780105591
Validation loss = 0.5732001662254333
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5751908421516418
Validation loss = 0.5728746056556702
Validation loss = 0.5757739543914795
Validation loss = 0.5727168321609497
Validation loss = 0.5698939561843872
Validation loss = 0.5711244940757751
Validation loss = 0.5736474394798279
Validation loss = 0.5763496160507202
Validation loss = 0.5743861794471741
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5711452960968018
Validation loss = 0.5755410194396973
Validation loss = 0.5711295008659363
Validation loss = 0.5726523399353027
Validation loss = 0.5739394426345825
Validation loss = 0.5742136836051941
Validation loss = 0.578493058681488
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5656521916389465
Validation loss = 0.5645342469215393
Validation loss = 0.5647804737091064
Validation loss = 0.5627581477165222
Validation loss = 0.5732974410057068
Validation loss = 0.5685951113700867
Validation loss = 0.5664010643959045
Validation loss = 0.5728081464767456
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 22
average number of affinization = 4.2027412906910335
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.208904109589041
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.2162007986309185
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.221778791334094
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.226210826210826
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 27
average number of affinization = 4.239179954441913
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 6
average number of affinization = 4.240182128628343
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.246302616609784
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 21
average number of affinization = 4.2558271745309835
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.2585227272727275
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.261783077796706
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.265607264472191
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.2682926829268295
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.27437641723356
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.279320113314448
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 18
average number of affinization = 4.2870894677236695
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.290888511601585
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.295814479638009
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 20
average number of affinization = 4.304691916336914
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 21
average number of affinization = 4.31412429378531
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 21
average number of affinization = 4.323546019198193
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 8
average number of affinization = 4.325620767494357
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.330513254371122
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.335400225479143
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 24
average number of affinization = 4.346478873239437
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -30.6    |
| Iteration     | 69       |
| MaximumReturn | -0.17    |
| MinimumReturn | -103     |
| TotalSamples  | 118286   |
----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5671266913414001
Validation loss = 0.5634452104568481
Validation loss = 0.5700051188468933
Validation loss = 0.5643356442451477
Validation loss = 0.5722014307975769
Validation loss = 0.5673170685768127
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5699129104614258
Validation loss = 0.5675687193870544
Validation loss = 0.5716924071311951
Validation loss = 0.5713779926300049
Validation loss = 0.573915421962738
Validation loss = 0.5733759999275208
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5724645853042603
Validation loss = 0.5726816654205322
Validation loss = 0.5705487728118896
Validation loss = 0.5738526582717896
Validation loss = 0.572192370891571
Validation loss = 0.5740284323692322
Validation loss = 0.576863169670105
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5713821053504944
Validation loss = 0.5706496238708496
Validation loss = 0.5727599859237671
Validation loss = 0.5723504424095154
Validation loss = 0.5740039348602295
Validation loss = 0.5726374983787537
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5668439269065857
Validation loss = 0.5654629468917847
Validation loss = 0.5667822360992432
Validation loss = 0.5683362483978271
Validation loss = 0.5676324367523193
Validation loss = 0.5667797923088074
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.3524774774774775
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.35846933033202
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 20
average number of affinization = 4.3672665916760405
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.3743676222596966
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.380337078651685
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 21
average number of affinization = 4.389668725435149
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.395622895622895
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 25
average number of affinization = 4.407178911946158
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.408632286995516
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.415686274509804
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 21
average number of affinization = 4.424972004479283
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.432008953553441
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.436241610738255
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.4393515930687535
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 18
average number of affinization = 4.446927374301676
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 18
average number of affinization = 4.454494695700726
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 18
average number of affinization = 4.462053571428571
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 24
average number of affinization = 4.472950362520915
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 21
average number of affinization = 4.48216276477146
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 24
average number of affinization = 4.493036211699164
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.499443207126949
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.505843071786311
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 21
average number of affinization = 4.515016685205784
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 39
average number of affinization = 4.534185658699277
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 23
average number of affinization = 4.544444444444444
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -18      |
| Iteration     | 70       |
| MaximumReturn | -0.227   |
| MinimumReturn | -80.5    |
| TotalSamples  | 119952   |
----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5662055611610413
Validation loss = 0.566785991191864
Validation loss = 0.5764791369438171
Validation loss = 0.564419686794281
Validation loss = 0.5674374103546143
Validation loss = 0.5708326697349548
Validation loss = 0.5762606263160706
Validation loss = 0.5736864805221558
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5652053356170654
Validation loss = 0.5690643787384033
Validation loss = 0.5713256597518921
Validation loss = 0.5678485631942749
Validation loss = 0.5734767317771912
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5745947957038879
Validation loss = 0.5732800364494324
Validation loss = 0.576321542263031
Validation loss = 0.5769966840744019
Validation loss = 0.5752311944961548
Validation loss = 0.5727297067642212
Validation loss = 0.577675461769104
Validation loss = 0.5732018351554871
Validation loss = 0.5813152194023132
Validation loss = 0.578131377696991
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5763589143753052
Validation loss = 0.5735653042793274
Validation loss = 0.5750212073326111
Validation loss = 0.5720458626747131
Validation loss = 0.575027346611023
Validation loss = 0.5729344487190247
Validation loss = 0.5737487077713013
Validation loss = 0.5768851637840271
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5655189156532288
Validation loss = 0.566994309425354
Validation loss = 0.5648084282875061
Validation loss = 0.5693441033363342
Validation loss = 0.5712096691131592
Validation loss = 0.567185640335083
Validation loss = 0.5695536136627197
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.546918378678512
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.551054384017758
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.554076539101498
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.560421286031042
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.564542936288088
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 4.5647840531561465
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.568345323741007
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.571902654867257
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.5754560530679935
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.577900552486188
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.5847598012147985
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.586092715231788
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.590733590733591
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 23
average number of affinization = 4.600882028665931
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 19
average number of affinization = 4.6088154269972454
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 19
average number of affinization = 4.616740088105727
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 21
average number of affinization = 4.625756741882223
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.627062706270627
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.631665750412314
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.6340659340659345
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 21
average number of affinization = 4.643053267435475
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.644346871569704
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 19
average number of affinization = 4.652221612726275
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.654605263157895
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.660821917808219
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -60.9    |
| Iteration     | 71       |
| MaximumReturn | -0.451   |
| MinimumReturn | -112     |
| TotalSamples  | 121618   |
----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5682882070541382
Validation loss = 0.5699117183685303
Validation loss = 0.5670561194419861
Validation loss = 0.5710514783859253
Validation loss = 0.5675248503684998
Validation loss = 0.570419192314148
Validation loss = 0.571469247341156
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.567775547504425
Validation loss = 0.5700608491897583
Validation loss = 0.5693555474281311
Validation loss = 0.5709717869758606
Validation loss = 0.5684793591499329
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5726906657218933
Validation loss = 0.5778710246086121
Validation loss = 0.5744644999504089
Validation loss = 0.5731155276298523
Validation loss = 0.575761079788208
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5745047330856323
Validation loss = 0.5741440653800964
Validation loss = 0.5724756121635437
Validation loss = 0.5750503540039062
Validation loss = 0.5711225867271423
Validation loss = 0.574847400188446
Validation loss = 0.5788487195968628
Validation loss = 0.5759379863739014
Validation loss = 0.5800834894180298
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5684624314308167
Validation loss = 0.570482075214386
Validation loss = 0.5668783783912659
Validation loss = 0.5720728039741516
Validation loss = 0.567093551158905
Validation loss = 0.5674610137939453
Validation loss = 0.5684359669685364
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.667579408543264
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.6732348111658455
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.676695842450766
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 28
average number of affinization = 4.689447785675233
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.690710382513661
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.6952484980884766
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.6965065502183405
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.7026732133115114
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 20
average number of affinization = 4.711014176663031
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.716621253405995
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.722222222222222
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.723462166575939
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.730141458106638
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.7346383904295815
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 22
average number of affinization = 4.744021739130435
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 19
average number of affinization = 4.751765344921238
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.754071661237785
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.75854584915898
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.759761388286334
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 20
average number of affinization = 4.768021680216802
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 25
average number of affinization = 4.778981581798483
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.7823497563616675
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.787337662337662
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 4.78691184424013
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.789729729729729
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -22.4    |
| Iteration     | 72       |
| MaximumReturn | -0.257   |
| MinimumReturn | -83      |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5705925822257996
Validation loss = 0.5635510087013245
Validation loss = 0.5653040409088135
Validation loss = 0.566877007484436
Validation loss = 0.567730188369751
Validation loss = 0.5687715411186218
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5648890733718872
Validation loss = 0.5700655579566956
Validation loss = 0.5680563449859619
Validation loss = 0.5670892000198364
Validation loss = 0.5712280869483948
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5759763121604919
Validation loss = 0.5672614574432373
Validation loss = 0.5685247778892517
Validation loss = 0.5724219679832458
Validation loss = 0.5745712518692017
Validation loss = 0.5751307010650635
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5794969797134399
Validation loss = 0.5721774101257324
Validation loss = 0.5707188248634338
Validation loss = 0.5749452710151672
Validation loss = 0.5743407607078552
Validation loss = 0.5772885680198669
Validation loss = 0.5793511867523193
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5652060508728027
Validation loss = 0.5645483136177063
Validation loss = 0.5665612816810608
Validation loss = 0.5686482191085815
Validation loss = 0.5654873847961426
Validation loss = 0.5706601738929749
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.7941653160453805
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.79805615550756
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 23
average number of affinization = 4.807879114948732
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 23
average number of affinization = 4.817691477885653
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 20
average number of affinization = 4.825876010781671
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 29
average number of affinization = 4.838900862068965
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 21
average number of affinization = 4.84760366182014
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.848762109795479
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 25
average number of affinization = 4.859601936525014
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.86505376344086
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.868350349274584
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 22
average number of affinization = 4.877551020408164
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.88191089640365
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.886802575107296
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 21
average number of affinization = 4.89544235924933
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.89924973204716
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.903053026245313
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.909528907922912
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.915462814339219
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.921390374331551
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.924639230358097
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.926816239316239
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.929524826481581
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 18
average number of affinization = 4.93649946638207
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.9424
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.33    |
| Iteration     | 73       |
| MaximumReturn | -0.176   |
| MinimumReturn | -36.5    |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5628146529197693
Validation loss = 0.5648563504219055
Validation loss = 0.5719360709190369
Validation loss = 0.5665963292121887
Validation loss = 0.5680490136146545
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5727289319038391
Validation loss = 0.5630537271499634
Validation loss = 0.5631700754165649
Validation loss = 0.5736998915672302
Validation loss = 0.5703696608543396
Validation loss = 0.567760169506073
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5760253667831421
Validation loss = 0.567079484462738
Validation loss = 0.574142575263977
Validation loss = 0.5723670721054077
Validation loss = 0.5741350054740906
Validation loss = 0.5720956921577454
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5765103697776794
Validation loss = 0.5755418539047241
Validation loss = 0.5726353526115417
Validation loss = 0.5746852159500122
Validation loss = 0.574242889881134
Validation loss = 0.5796381831169128
Validation loss = 0.5732588768005371
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5728196501731873
Validation loss = 0.5677254796028137
Validation loss = 0.5726400017738342
Validation loss = 0.5686389207839966
Validation loss = 0.5660515427589417
Validation loss = 0.5716079473495483
Validation loss = 0.5717196464538574
Validation loss = 0.5737978219985962
Validation loss = 0.5716771483421326
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.947761194029851
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.9531166755460845
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.955271565495208
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.959020755721128
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.964893617021277
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 25
average number of affinization = 4.975544922913344
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.981934112646121
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 4
average number of affinization = 4.981412639405204
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.987261146496815
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.9883289124668435
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.993107104984094
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.995760466348702
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.000529661016949
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 6
average number of affinization = 5.001058761249339
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 11
average number of affinization = 5.004232804232804
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 20
average number of affinization = 5.01216287678477
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 17
average number of affinization = 5.018498942917548
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 9
average number of affinization = 5.020602218700476
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 10
average number of affinization = 5.0232312565997885
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 12
average number of affinization = 5.026912928759894
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.031645569620253
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 9
average number of affinization = 5.033737480231945
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 10
average number of affinization = 5.036354056902002
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 10
average number of affinization = 5.038967877830437
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 22
average number of affinization = 5.047894736842105
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -33.6    |
| Iteration     | 74       |
| MaximumReturn | -0.226   |
| MinimumReturn | -112     |
| TotalSamples  | 126616   |
----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5669106245040894
Validation loss = 0.56511390209198
Validation loss = 0.5660417079925537
Validation loss = 0.5695823431015015
Validation loss = 0.5743920207023621
Validation loss = 0.5647808313369751
Validation loss = 0.5697454810142517
Validation loss = 0.5684972405433655
Validation loss = 0.5698981285095215
Validation loss = 0.5744100213050842
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5693206191062927
Validation loss = 0.5650733113288879
Validation loss = 0.5685239434242249
Validation loss = 0.5670812129974365
Validation loss = 0.5660701394081116
Validation loss = 0.5705209374427795
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5668928027153015
Validation loss = 0.569499135017395
Validation loss = 0.5709280967712402
Validation loss = 0.5714386105537415
Validation loss = 0.5718501806259155
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5759693384170532
Validation loss = 0.569586455821991
Validation loss = 0.5742860436439514
Validation loss = 0.5753824710845947
Validation loss = 0.5788670778274536
Validation loss = 0.5733564496040344
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.57332444190979
Validation loss = 0.5727327466011047
Validation loss = 0.5646946430206299
Validation loss = 0.5698778629302979
Validation loss = 0.5664881467819214
Validation loss = 0.574515700340271
Validation loss = 0.5714706778526306
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.055234087322462
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 17
average number of affinization = 5.061514195583596
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 13
average number of affinization = 5.065685759327378
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 23
average number of affinization = 5.075105042016807
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 8
average number of affinization = 5.076640419947506
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 24
average number of affinization = 5.086568730325289
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 8
average number of affinization = 5.088096486628212
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.0927672955974845
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 22
average number of affinization = 5.101623886851755
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 26
average number of affinization = 5.112565445026178
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.117739403453689
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 17
average number of affinization = 5.1239539748953975
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 11
average number of affinization = 5.127025614218505
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.132706374085685
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.1378590078328985
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.145093945720251
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.150234741784038
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 7
average number of affinization = 5.151199165797706
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.156852527357999
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 22
average number of affinization = 5.165625
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 11
average number of affinization = 5.168662155127538
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.173257023933402
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 23
average number of affinization = 5.182527301092044
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 11
average number of affinization = 5.1855509355509355
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 7
average number of affinization = 5.186493506493506
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -9.72    |
| Iteration     | 75       |
| MaximumReturn | -0.27    |
| MinimumReturn | -69.9    |
| TotalSamples  | 128282   |
----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5669972896575928
Validation loss = 0.5697088241577148
Validation loss = 0.5661929845809937
Validation loss = 0.565518856048584
Validation loss = 0.5683484077453613
Validation loss = 0.5692812204360962
Validation loss = 0.5707080364227295
Validation loss = 0.5712631940841675
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5702186226844788
Validation loss = 0.5675616264343262
Validation loss = 0.5674845576286316
Validation loss = 0.5663815140724182
Validation loss = 0.5702992677688599
Validation loss = 0.568554699420929
Validation loss = 0.5709792375564575
Validation loss = 0.5704795122146606
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.570326566696167
Validation loss = 0.5672520995140076
Validation loss = 0.5700839757919312
Validation loss = 0.57154381275177
Validation loss = 0.5711456537246704
Validation loss = 0.5746005177497864
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5759927034378052
Validation loss = 0.5714845657348633
Validation loss = 0.5740394592285156
Validation loss = 0.5729025602340698
Validation loss = 0.5754809379577637
Validation loss = 0.577154278755188
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5769051909446716
Validation loss = 0.5707153081893921
Validation loss = 0.5681687593460083
Validation loss = 0.5730971097946167
Validation loss = 0.5710905194282532
Validation loss = 0.5769239664077759
Validation loss = 0.5725467801094055
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.193665628245068
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 11
average number of affinization = 5.196678775298391
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 22
average number of affinization = 5.2053941908713695
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.209953343701399
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.215025906735751
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 24
average number of affinization = 5.224754013464526
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 17
average number of affinization = 5.2308488612836435
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 18
average number of affinization = 5.237454733574754
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.241985522233713
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 5.239793281653747
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.246900826446281
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 13
average number of affinization = 5.250903458957151
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 12
average number of affinization = 5.254385964912281
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.261474987106756
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.265979381443299
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.270994332818135
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 8
average number of affinization = 5.2723995880535535
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 10
average number of affinization = 5.274832732887288
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.280349794238683
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 9
average number of affinization = 5.282262210796915
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 11
average number of affinization = 5.285200411099692
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.29070364663585
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.29517453798768
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.300667008722422
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.306153846153846
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11.9    |
| Iteration     | 76       |
| MaximumReturn | -0.289   |
| MinimumReturn | -228     |
| TotalSamples  | 129948   |
----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5715725421905518
Validation loss = 0.5673156380653381
Validation loss = 0.5721903443336487
Validation loss = 0.5660989284515381
Validation loss = 0.5712938904762268
Validation loss = 0.5674921870231628
Validation loss = 0.5704544186592102
Validation loss = 0.5714828372001648
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5683930516242981
Validation loss = 0.5673732161521912
Validation loss = 0.5697258114814758
Validation loss = 0.5692453980445862
Validation loss = 0.5723913908004761
Validation loss = 0.5725305676460266
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5670604109764099
Validation loss = 0.5708683729171753
Validation loss = 0.5714738965034485
Validation loss = 0.5736598372459412
Validation loss = 0.574025571346283
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5742721557617188
Validation loss = 0.5751190781593323
Validation loss = 0.5746667385101318
Validation loss = 0.5703569650650024
Validation loss = 0.5746896862983704
Validation loss = 0.5746195912361145
Validation loss = 0.578615665435791
Validation loss = 0.5764400362968445
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5682824850082397
Validation loss = 0.5677191615104675
Validation loss = 0.5772378444671631
Validation loss = 0.5721966028213501
Validation loss = 0.5751242637634277
Validation loss = 0.5726010203361511
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 8
average number of affinization = 5.307534597642235
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 10
average number of affinization = 5.309938524590164
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 13
average number of affinization = 5.3138760880696365
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 13
average number of affinization = 5.317809621289662
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.3227621483375955
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 13
average number of affinization = 5.326687116564417
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.333673990802248
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.3391215526046985
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 10
average number of affinization = 5.341500765696784
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.346938775510204
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 20
average number of affinization = 5.354411014788373
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.359327217125382
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 11
average number of affinization = 5.362200713194091
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 24
average number of affinization = 5.371690427698574
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 23
average number of affinization = 5.380661577608143
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 17
average number of affinization = 5.386571719226857
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.393492628368073
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.3983739837398375
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.403758252920264
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.4106598984771574
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 24
average number of affinization = 5.420091324200913
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 10
average number of affinization = 5.422413793103448
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 13
average number of affinization = 5.4262544348707555
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 11
average number of affinization = 5.429078014184397
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.435949367088607
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.97    |
| Iteration     | 77       |
| MaximumReturn | -0.333   |
| MinimumReturn | -19.1    |
| TotalSamples  | 131614   |
----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5717470645904541
Validation loss = 0.5673600435256958
Validation loss = 0.5700668692588806
Validation loss = 0.5667228698730469
Validation loss = 0.57297682762146
Validation loss = 0.5702953338623047
Validation loss = 0.571739673614502
Validation loss = 0.5699269771575928
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5708302855491638
Validation loss = 0.568517804145813
Validation loss = 0.5685227513313293
Validation loss = 0.5679299831390381
Validation loss = 0.5705543756484985
Validation loss = 0.57364422082901
Validation loss = 0.5777400732040405
Validation loss = 0.5695466995239258
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5710486173629761
Validation loss = 0.5703396201133728
Validation loss = 0.571422278881073
Validation loss = 0.5703912973403931
Validation loss = 0.5713698267936707
Validation loss = 0.5740727782249451
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5798946022987366
Validation loss = 0.5757072567939758
Validation loss = 0.5753976106643677
Validation loss = 0.5762671232223511
Validation loss = 0.572721004486084
Validation loss = 0.5763280391693115
Validation loss = 0.5825515389442444
Validation loss = 0.577391505241394
Validation loss = 0.5831188559532166
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.569834291934967
Validation loss = 0.5683701634407043
Validation loss = 0.5694753527641296
Validation loss = 0.57208251953125
Validation loss = 0.5717802047729492
Validation loss = 0.571537971496582
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.442813765182186
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 22
average number of affinization = 5.451188669701568
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.455510616784631
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 18
average number of affinization = 5.461849418898433
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 24
average number of affinization = 5.471212121212122
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 20
average number of affinization = 5.478546188793539
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 17
average number of affinization = 5.484359233097881
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.4911749873928395
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 20
average number of affinization = 5.498487903225806
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 11
average number of affinization = 5.501259445843829
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 24
average number of affinization = 5.5105740181268885
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 13
average number of affinization = 5.51434323100151
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 13
average number of affinization = 5.518108651911469
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 10
average number of affinization = 5.520361990950226
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 18
average number of affinization = 5.526633165829145
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 18
average number of affinization = 5.532898041185334
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.539658634538153
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.5439036628198695
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.5481444332999
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 11
average number of affinization = 5.5508771929824565
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 27
average number of affinization = 5.561623246492986
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 18
average number of affinization = 5.5678517776665
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 25
average number of affinization = 5.5775775775775776
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.584292146073037
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 11
average number of affinization = 5.587
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.37    |
| Iteration     | 78       |
| MaximumReturn | -0.386   |
| MinimumReturn | -85.1    |
| TotalSamples  | 133280   |
----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5689185857772827
Validation loss = 0.5677081346511841
Validation loss = 0.567339301109314
Validation loss = 0.5705593824386597
Validation loss = 0.5694088339805603
Validation loss = 0.5686796307563782
Validation loss = 0.5673725605010986
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5683384537696838
Validation loss = 0.5695524215698242
Validation loss = 0.5679518580436707
Validation loss = 0.5715312957763672
Validation loss = 0.5717231631278992
Validation loss = 0.5702674388885498
Validation loss = 0.5675120949745178
Validation loss = 0.5742099285125732
Validation loss = 0.5739213228225708
Validation loss = 0.5752710103988647
Validation loss = 0.5751499533653259
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5707916021347046
Validation loss = 0.5656643509864807
Validation loss = 0.5669146180152893
Validation loss = 0.5698818564414978
Validation loss = 0.5744725465774536
Validation loss = 0.5700685977935791
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5800020694732666
Validation loss = 0.5739915370941162
Validation loss = 0.5717575550079346
Validation loss = 0.5778194665908813
Validation loss = 0.5800250768661499
Validation loss = 0.5777836441993713
Validation loss = 0.5768119096755981
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5722411870956421
Validation loss = 0.5695383548736572
Validation loss = 0.5651846528053284
Validation loss = 0.567223846912384
Validation loss = 0.5709282159805298
Validation loss = 0.5718339085578918
Validation loss = 0.5730414986610413
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.591704147926037
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 12
average number of affinization = 5.594905094905095
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 21
average number of affinization = 5.602596105841238
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 18
average number of affinization = 5.608782435129741
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.615461346633417
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.620638085742772
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 20
average number of affinization = 5.62780269058296
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.632470119521912
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.636635141861623
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 7
average number of affinization = 5.637313432835821
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 13
average number of affinization = 5.640974639482844
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 21
average number of affinization = 5.648608349900597
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 17
average number of affinization = 5.65424739195231
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.65888778550149
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 13
average number of affinization = 5.662531017369727
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 23
average number of affinization = 5.6711309523809526
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 6
average number of affinization = 5.671294000991572
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 20
average number of affinization = 5.678394449950446
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.683011391778108
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 5
average number of affinization = 5.682673267326733
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 17
average number of affinization = 5.6882731321128155
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 18
average number of affinization = 5.6943620178041545
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 24
average number of affinization = 5.703410776075136
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.70998023715415
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 12
average number of affinization = 5.713086419753086
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -10.7    |
| Iteration     | 79       |
| MaximumReturn | -0.256   |
| MinimumReturn | -161     |
| TotalSamples  | 134946   |
----------------------------
