Logging to experiments/invertedPendulum/nov1/w350e3_seed3214
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5558218359947205
Validation loss = 0.2554139494895935
Validation loss = 0.26205816864967346
Validation loss = 0.215403750538826
Validation loss = 0.19697493314743042
Validation loss = 0.18669456243515015
Validation loss = 0.1804703027009964
Validation loss = 0.16582489013671875
Validation loss = 0.1625174582004547
Validation loss = 0.16128839552402496
Validation loss = 0.14295139908790588
Validation loss = 0.138540580868721
Validation loss = 0.14218558371067047
Validation loss = 0.12958692014217377
Validation loss = 0.14203692972660065
Validation loss = 0.12721924483776093
Validation loss = 0.13369616866111755
Validation loss = 0.12298686802387238
Validation loss = 0.10828482359647751
Validation loss = 0.1121896430850029
Validation loss = 0.11631108075380325
Validation loss = 0.09544279426336288
Validation loss = 0.09116049110889435
Validation loss = 0.08531679213047028
Validation loss = 0.124466173350811
Validation loss = 0.09022053331136703
Validation loss = 0.09183677285909653
Validation loss = 0.07807530462741852
Validation loss = 0.0700281485915184
Validation loss = 0.0669407844543457
Validation loss = 0.07069192826747894
Validation loss = 0.06725326925516129
Validation loss = 0.0622638501226902
Validation loss = 0.05889708176255226
Validation loss = 0.06169204041361809
Validation loss = 0.054953787475824356
Validation loss = 0.061362944543361664
Validation loss = 0.05576932802796364
Validation loss = 0.05602620542049408
Validation loss = 0.054148487746715546
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.549502968788147
Validation loss = 0.26446619629859924
Validation loss = 0.22533635795116425
Validation loss = 0.19932100176811218
Validation loss = 0.2030564844608307
Validation loss = 0.1727435439825058
Validation loss = 0.176560178399086
Validation loss = 0.16092735528945923
Validation loss = 0.15926948189735413
Validation loss = 0.1573781818151474
Validation loss = 0.14145253598690033
Validation loss = 0.14474248886108398
Validation loss = 0.1410081833600998
Validation loss = 0.12738960981369019
Validation loss = 0.12255945056676865
Validation loss = 0.1267702728509903
Validation loss = 0.11014726012945175
Validation loss = 0.1027025431394577
Validation loss = 0.1081242710351944
Validation loss = 0.10301056504249573
Validation loss = 0.0949719101190567
Validation loss = 0.09385628253221512
Validation loss = 0.08682740479707718
Validation loss = 0.083502396941185
Validation loss = 0.0884951576590538
Validation loss = 0.08978448063135147
Validation loss = 0.07115506380796432
Validation loss = 0.07556246221065521
Validation loss = 0.07123134285211563
Validation loss = 0.0660666972398758
Validation loss = 0.07409050315618515
Validation loss = 0.06087963655591011
Validation loss = 0.0614844411611557
Validation loss = 0.054772164672613144
Validation loss = 0.05500923469662666
Validation loss = 0.055944543331861496
Validation loss = 0.05366966500878334
Validation loss = 0.051117200404405594
Validation loss = 0.054270192980766296
Validation loss = 0.052265021950006485
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5461268424987793
Validation loss = 0.31672897934913635
Validation loss = 0.24383360147476196
Validation loss = 0.21174252033233643
Validation loss = 0.20465417206287384
Validation loss = 0.189708411693573
Validation loss = 0.18473848700523376
Validation loss = 0.17112067341804504
Validation loss = 0.16218262910842896
Validation loss = 0.15600210428237915
Validation loss = 0.1602550894021988
Validation loss = 0.16687072813510895
Validation loss = 0.1449766606092453
Validation loss = 0.13701972365379333
Validation loss = 0.14079171419143677
Validation loss = 0.12474758177995682
Validation loss = 0.14567212760448456
Validation loss = 0.12601207196712494
Validation loss = 0.1387304663658142
Validation loss = 0.12308905273675919
Validation loss = 0.11007598787546158
Validation loss = 0.1093873605132103
Validation loss = 0.09575125575065613
Validation loss = 0.10468371957540512
Validation loss = 0.09448376297950745
Validation loss = 0.09313055872917175
Validation loss = 0.08321020007133484
Validation loss = 0.0912943035364151
Validation loss = 0.07380111515522003
Validation loss = 0.07689224183559418
Validation loss = 0.0792122557759285
Validation loss = 0.07579599320888519
Validation loss = 0.06911196559667587
Validation loss = 0.06622680276632309
Validation loss = 0.06370683759450912
Validation loss = 0.05785825103521347
Validation loss = 0.061333488672971725
Validation loss = 0.05650106817483902
Validation loss = 0.05916999280452728
Validation loss = 0.052651919424533844
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5448282361030579
Validation loss = 0.2954501807689667
Validation loss = 0.2675406038761139
Validation loss = 0.21325641870498657
Validation loss = 0.19948433339595795
Validation loss = 0.1916993111371994
Validation loss = 0.17754803597927094
Validation loss = 0.16824756562709808
Validation loss = 0.15844769775867462
Validation loss = 0.15472093224525452
Validation loss = 0.1533384621143341
Validation loss = 0.1543198674917221
Validation loss = 0.13956549763679504
Validation loss = 0.14808638393878937
Validation loss = 0.14483706653118134
Validation loss = 0.12891548871994019
Validation loss = 0.11901826411485672
Validation loss = 0.11714035272598267
Validation loss = 0.11351983994245529
Validation loss = 0.11285106837749481
Validation loss = 0.11682920902967453
Validation loss = 0.10135949403047562
Validation loss = 0.09523983299732208
Validation loss = 0.1082046777009964
Validation loss = 0.09515257924795151
Validation loss = 0.09412138164043427
Validation loss = 0.0868375226855278
Validation loss = 0.07693130522966385
Validation loss = 0.07370888441801071
Validation loss = 0.07178850471973419
Validation loss = 0.06593439728021622
Validation loss = 0.06556645035743713
Validation loss = 0.05692228674888611
Validation loss = 0.060355886816978455
Validation loss = 0.05800817161798477
Validation loss = 0.054583266377449036
Validation loss = 0.05776349827647209
Validation loss = 0.057594381272792816
Validation loss = 0.07669076323509216
Validation loss = 0.06408316642045975
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5523491501808167
Validation loss = 0.27735599875450134
Validation loss = 0.22711648046970367
Validation loss = 0.20206604897975922
Validation loss = 0.19755157828330994
Validation loss = 0.1830868422985077
Validation loss = 0.1728105992078781
Validation loss = 0.16963572800159454
Validation loss = 0.17145393788814545
Validation loss = 0.1665947437286377
Validation loss = 0.15550166368484497
Validation loss = 0.16165916621685028
Validation loss = 0.13262434303760529
Validation loss = 0.13143180310726166
Validation loss = 0.12432866543531418
Validation loss = 0.1330859214067459
Validation loss = 0.1315617561340332
Validation loss = 0.12896007299423218
Validation loss = 0.10899492353200912
Validation loss = 0.10565995424985886
Validation loss = 0.09545533359050751
Validation loss = 0.10094082355499268
Validation loss = 0.08408123254776001
Validation loss = 0.0962802991271019
Validation loss = 0.08284537494182587
Validation loss = 0.07262951880693436
Validation loss = 0.07451233267784119
Validation loss = 0.06856304407119751
Validation loss = 0.06519041210412979
Validation loss = 0.07008104771375656
Validation loss = 0.061561062932014465
Validation loss = 0.06090514734387398
Validation loss = 0.066446453332901
Validation loss = 0.06334378570318222
Validation loss = 0.05744737759232521
Validation loss = 0.06146042048931122
Validation loss = 0.06509773433208466
Validation loss = 0.05831984058022499
Validation loss = 0.05740701034665108
Validation loss = 0.04971269518136978
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0268  |
| Iteration     | 0        |
| MaximumReturn | -0.0156  |
| MinimumReturn | -0.0566  |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2631937563419342
Validation loss = 0.11988931894302368
Validation loss = 0.10438847541809082
Validation loss = 0.08312097191810608
Validation loss = 0.06577286124229431
Validation loss = 0.06256237626075745
Validation loss = 0.0624500997364521
Validation loss = 0.05614825338125229
Validation loss = 0.046303268522024155
Validation loss = 0.05336917191743851
Validation loss = 0.04242343828082085
Validation loss = 0.04467529058456421
Validation loss = 0.045169759541749954
Validation loss = 0.03970541059970856
Validation loss = 0.03956259414553642
Validation loss = 0.03826221078634262
Validation loss = 0.04097360745072365
Validation loss = 0.037366416305303574
Validation loss = 0.04096449911594391
Validation loss = 0.04882856085896492
Validation loss = 0.05312487855553627
Validation loss = 0.050498686730861664
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.21938996016979218
Validation loss = 0.12345165014266968
Validation loss = 0.09005050361156464
Validation loss = 0.07823190093040466
Validation loss = 0.06971035897731781
Validation loss = 0.06149575859308243
Validation loss = 0.05909760296344757
Validation loss = 0.05617842450737953
Validation loss = 0.05365113168954849
Validation loss = 0.05505255237221718
Validation loss = 0.045230966061353683
Validation loss = 0.042964186519384384
Validation loss = 0.05054269731044769
Validation loss = 0.04677316173911095
Validation loss = 0.04890205338597298
Validation loss = 0.044557835906744
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.16835099458694458
Validation loss = 0.09064219892024994
Validation loss = 0.06313479691743851
Validation loss = 0.05119844153523445
Validation loss = 0.04984993487596512
Validation loss = 0.0413224995136261
Validation loss = 0.040561847388744354
Validation loss = 0.04221455007791519
Validation loss = 0.0481882244348526
Validation loss = 0.04172537848353386
Validation loss = 0.04373954236507416
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.24146640300750732
Validation loss = 0.13102039694786072
Validation loss = 0.10184947401285172
Validation loss = 0.09095458686351776
Validation loss = 0.0722668468952179
Validation loss = 0.05734221264719963
Validation loss = 0.05727855861186981
Validation loss = 0.055094048380851746
Validation loss = 0.051901042461395264
Validation loss = 0.04361310601234436
Validation loss = 0.054086972028017044
Validation loss = 0.044028736650943756
Validation loss = 0.051930636167526245
Validation loss = 0.04359586164355278
Validation loss = 0.048020139336586
Validation loss = 0.04107232764363289
Validation loss = 0.04014015197753906
Validation loss = 0.03894897922873497
Validation loss = 0.03343585506081581
Validation loss = 0.046719495207071304
Validation loss = 0.03862960636615753
Validation loss = 0.03690311312675476
Validation loss = 0.03616046532988548
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.20885737240314484
Validation loss = 0.10870048403739929
Validation loss = 0.09402928501367569
Validation loss = 0.08321516960859299
Validation loss = 0.07948534935712814
Validation loss = 0.0710122138261795
Validation loss = 0.06377477198839188
Validation loss = 0.06076747551560402
Validation loss = 0.0546494796872139
Validation loss = 0.0511382520198822
Validation loss = 0.048879679292440414
Validation loss = 0.04655510187149048
Validation loss = 0.04776905104517937
Validation loss = 0.04793036729097366
Validation loss = 0.04498806223273277
Validation loss = 0.050488490611314774
Validation loss = 0.04578758031129837
Validation loss = 0.04221127927303314
Validation loss = 0.04271203279495239
Validation loss = 0.0552099384367466
Validation loss = 0.05851544439792633
Validation loss = 0.05532202124595642
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00267 |
| Iteration     | 1        |
| MaximumReturn | -0.00209 |
| MinimumReturn | -0.00359 |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.051459573209285736
Validation loss = 0.04906480386853218
Validation loss = 0.042072512209415436
Validation loss = 0.038683898746967316
Validation loss = 0.03304831683635712
Validation loss = 0.03300933167338371
Validation loss = 0.03523791953921318
Validation loss = 0.031751006841659546
Validation loss = 0.031172096729278564
Validation loss = 0.02936139889061451
Validation loss = 0.0316440612077713
Validation loss = 0.032679811120033264
Validation loss = 0.032785285264253616
Validation loss = 0.03867216408252716
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.061204299330711365
Validation loss = 0.04883728176355362
Validation loss = 0.0393296480178833
Validation loss = 0.0380464643239975
Validation loss = 0.038931094110012054
Validation loss = 0.03955642879009247
Validation loss = 0.03549135476350784
Validation loss = 0.03326989710330963
Validation loss = 0.03666849434375763
Validation loss = 0.034930117428302765
Validation loss = 0.038797006011009216
Validation loss = 0.03397089242935181
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06150444969534874
Validation loss = 0.03775865584611893
Validation loss = 0.04517009109258652
Validation loss = 0.04145803302526474
Validation loss = 0.040428437292575836
Validation loss = 0.03615780547261238
Validation loss = 0.033459022641181946
Validation loss = 0.04675806686282158
Validation loss = 0.03264787793159485
Validation loss = 0.03212635964155197
Validation loss = 0.030986160039901733
Validation loss = 0.03283923491835594
Validation loss = 0.03578520566225052
Validation loss = 0.031985655426979065
Validation loss = 0.032198596745729446
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.052620500326156616
Validation loss = 0.04508906230330467
Validation loss = 0.05293012782931328
Validation loss = 0.03817404806613922
Validation loss = 0.03204536810517311
Validation loss = 0.033311646431684494
Validation loss = 0.03375453129410744
Validation loss = 0.03553347289562225
Validation loss = 0.03196104243397713
Validation loss = 0.030071567744016647
Validation loss = 0.02848171815276146
Validation loss = 0.028651155531406403
Validation loss = 0.03247470408678055
Validation loss = 0.028073757886886597
Validation loss = 0.031573399901390076
Validation loss = 0.027686364948749542
Validation loss = 0.027896955609321594
Validation loss = 0.028992727398872375
Validation loss = 0.028543997555971146
Validation loss = 0.032665327191352844
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05570622533559799
Validation loss = 0.04795089736580849
Validation loss = 0.039310816675424576
Validation loss = 0.03985462337732315
Validation loss = 0.042663272470235825
Validation loss = 0.0375562384724617
Validation loss = 0.037409715354442596
Validation loss = 0.04260822385549545
Validation loss = 0.04550826922059059
Validation loss = 0.03713512420654297
Validation loss = 0.034066226333379745
Validation loss = 0.035266127437353134
Validation loss = 0.031228190287947655
Validation loss = 0.03330432251095772
Validation loss = 0.03740498423576355
Validation loss = 0.038977306336164474
Validation loss = 0.034596771001815796
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000815 |
| Iteration     | 2         |
| MaximumReturn | -0.000605 |
| MinimumReturn | -0.00124  |
| TotalSamples  | 6664      |
-----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04439152777194977
Validation loss = 0.02774948626756668
Validation loss = 0.02854759246110916
Validation loss = 0.024263476952910423
Validation loss = 0.02596096880733967
Validation loss = 0.02271818183362484
Validation loss = 0.024180084466934204
Validation loss = 0.021902596578001976
Validation loss = 0.023127637803554535
Validation loss = 0.02087578922510147
Validation loss = 0.027099719271063805
Validation loss = 0.032811764627695084
Validation loss = 0.033042680472135544
Validation loss = 0.027900226414203644
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03656236454844475
Validation loss = 0.039963338524103165
Validation loss = 0.024815665557980537
Validation loss = 0.025334855541586876
Validation loss = 0.023385150358080864
Validation loss = 0.028469165787100792
Validation loss = 0.038827184587717056
Validation loss = 0.028871363028883934
Validation loss = 0.024318484589457512
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.032627496868371964
Validation loss = 0.02619963139295578
Validation loss = 0.02911253273487091
Validation loss = 0.030784571543335915
Validation loss = 0.02596786431968212
Validation loss = 0.0256852637976408
Validation loss = 0.02731235884130001
Validation loss = 0.025052348151803017
Validation loss = 0.02578330598771572
Validation loss = 0.027707742527127266
Validation loss = 0.021847719326615334
Validation loss = 0.025415504351258278
Validation loss = 0.021677687764167786
Validation loss = 0.024551691487431526
Validation loss = 0.021416926756501198
Validation loss = 0.021258356049656868
Validation loss = 0.021346889436244965
Validation loss = 0.024898022413253784
Validation loss = 0.020148763433098793
Validation loss = 0.027728311717510223
Validation loss = 0.025325147435069084
Validation loss = 0.02998875081539154
Validation loss = 0.024706052616238594
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.053209442645311356
Validation loss = 0.030354278162121773
Validation loss = 0.02753671072423458
Validation loss = 0.024937694892287254
Validation loss = 0.02254229597747326
Validation loss = 0.022924557328224182
Validation loss = 0.022884534671902657
Validation loss = 0.028248058632016182
Validation loss = 0.025261836126446724
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03589639067649841
Validation loss = 0.02937081642448902
Validation loss = 0.025873752310872078
Validation loss = 0.028722522780299187
Validation loss = 0.027487413957715034
Validation loss = 0.032606396824121475
Validation loss = 0.027104727923870087
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000984 |
| Iteration     | 3         |
| MaximumReturn | -0.000681 |
| MinimumReturn | -0.00156  |
| TotalSamples  | 8330      |
-----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024304591119289398
Validation loss = 0.026347361505031586
Validation loss = 0.018587147817015648
Validation loss = 0.020403526723384857
Validation loss = 0.01681477203965187
Validation loss = 0.017001060768961906
Validation loss = 0.02067810483276844
Validation loss = 0.021277841180562973
Validation loss = 0.018848981708288193
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02645958587527275
Validation loss = 0.02005440555512905
Validation loss = 0.01810503751039505
Validation loss = 0.0216678474098444
Validation loss = 0.018425624817609787
Validation loss = 0.019366074353456497
Validation loss = 0.016229983419179916
Validation loss = 0.022068675607442856
Validation loss = 0.015504583716392517
Validation loss = 0.021373789757490158
Validation loss = 0.023644698783755302
Validation loss = 0.017972739413380623
Validation loss = 0.0179729200899601
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03247765824198723
Validation loss = 0.02216028980910778
Validation loss = 0.016427189111709595
Validation loss = 0.017586946487426758
Validation loss = 0.021989652886986732
Validation loss = 0.017261231318116188
Validation loss = 0.01894155703485012
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02115158922970295
Validation loss = 0.018048956990242004
Validation loss = 0.018746020272374153
Validation loss = 0.020237715914845467
Validation loss = 0.0192112997174263
Validation loss = 0.01739392802119255
Validation loss = 0.016789264976978302
Validation loss = 0.023058846592903137
Validation loss = 0.016766682267189026
Validation loss = 0.01596013642847538
Validation loss = 0.01847960613667965
Validation loss = 0.01725592464208603
Validation loss = 0.019107818603515625
Validation loss = 0.01823330484330654
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.035639792680740356
Validation loss = 0.020200345665216446
Validation loss = 0.02174730971455574
Validation loss = 0.018037086352705956
Validation loss = 0.01724851317703724
Validation loss = 0.02028379961848259
Validation loss = 0.019015271216630936
Validation loss = 0.020451141521334648
Validation loss = 0.018357830122113228
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000872 |
| Iteration     | 4         |
| MaximumReturn | -0.000666 |
| MinimumReturn | -0.00134  |
| TotalSamples  | 9996      |
-----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0215897336602211
Validation loss = 0.017982106655836105
Validation loss = 0.01845894381403923
Validation loss = 0.019895412027835846
Validation loss = 0.014438892714679241
Validation loss = 0.014836246147751808
Validation loss = 0.013803062029182911
Validation loss = 0.015166470780968666
Validation loss = 0.014088327065110207
Validation loss = 0.017749961465597153
Validation loss = 0.01973184011876583
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015989944338798523
Validation loss = 0.01884624920785427
Validation loss = 0.013544229790568352
Validation loss = 0.01922127604484558
Validation loss = 0.017934609204530716
Validation loss = 0.01589776948094368
Validation loss = 0.022048931568861008
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015468394383788109
Validation loss = 0.021992944180965424
Validation loss = 0.0185499619692564
Validation loss = 0.01892310567200184
Validation loss = 0.01303032971918583
Validation loss = 0.015083419159054756
Validation loss = 0.014608982019126415
Validation loss = 0.017545970156788826
Validation loss = 0.015047003515064716
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020493334159255028
Validation loss = 0.030720481649041176
Validation loss = 0.015128141269087791
Validation loss = 0.020762452855706215
Validation loss = 0.014006135985255241
Validation loss = 0.012818360701203346
Validation loss = 0.012727227993309498
Validation loss = 0.013048617169260979
Validation loss = 0.013655239716172218
Validation loss = 0.013438533060252666
Validation loss = 0.01588461920619011
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.027130555361509323
Validation loss = 0.023549746721982956
Validation loss = 0.024480022490024567
Validation loss = 0.02519471012055874
Validation loss = 0.017212707549333572
Validation loss = 0.018332701176404953
Validation loss = 0.022397469729185104
Validation loss = 0.01667441800236702
Validation loss = 0.015381860546767712
Validation loss = 0.017638614401221275
Validation loss = 0.017303042113780975
Validation loss = 0.017072603106498718
Validation loss = 0.0141350282356143
Validation loss = 0.013530862517654896
Validation loss = 0.01487189345061779
Validation loss = 0.015277641825377941
Validation loss = 0.023550497367978096
Validation loss = 0.01631646603345871
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00109 |
| Iteration     | 5        |
| MaximumReturn | -0.00076 |
| MinimumReturn | -0.00186 |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023585103452205658
Validation loss = 0.02093326486647129
Validation loss = 0.019627809524536133
Validation loss = 0.01867024414241314
Validation loss = 0.01755325123667717
Validation loss = 0.016985846683382988
Validation loss = 0.021969575434923172
Validation loss = 0.01832689717411995
Validation loss = 0.015440771356225014
Validation loss = 0.018047969788312912
Validation loss = 0.015541006810963154
Validation loss = 0.015848785638809204
Validation loss = 0.016621172428131104
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.026495900005102158
Validation loss = 0.017546802759170532
Validation loss = 0.017016587778925896
Validation loss = 0.016593370586633682
Validation loss = 0.018866579979658127
Validation loss = 0.023391472175717354
Validation loss = 0.015805426985025406
Validation loss = 0.01991182193160057
Validation loss = 0.015852702781558037
Validation loss = 0.0189288891851902
Validation loss = 0.014170141890645027
Validation loss = 0.01774226315319538
Validation loss = 0.01603226736187935
Validation loss = 0.014626788906753063
Validation loss = 0.02113981731235981
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020329969003796577
Validation loss = 0.016687918454408646
Validation loss = 0.01448659785091877
Validation loss = 0.0158648993819952
Validation loss = 0.014888348057866096
Validation loss = 0.015325762331485748
Validation loss = 0.014815075322985649
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019281156361103058
Validation loss = 0.013187205418944359
Validation loss = 0.017572855576872826
Validation loss = 0.02059657871723175
Validation loss = 0.018541650846600533
Validation loss = 0.015382690355181694
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020568329840898514
Validation loss = 0.026118651032447815
Validation loss = 0.01912536472082138
Validation loss = 0.018411820754408836
Validation loss = 0.017113883048295975
Validation loss = 0.019666830077767372
Validation loss = 0.01661543920636177
Validation loss = 0.018309777602553368
Validation loss = 0.01624349132180214
Validation loss = 0.0195465050637722
Validation loss = 0.014006301760673523
Validation loss = 0.015998629853129387
Validation loss = 0.02042875997722149
Validation loss = 0.019142920151352882
Validation loss = 0.01598815619945526
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0138  |
| Iteration     | 6        |
| MaximumReturn | -0.00859 |
| MinimumReturn | -0.0216  |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01993403211236
Validation loss = 0.011256363242864609
Validation loss = 0.014571725390851498
Validation loss = 0.011307776905596256
Validation loss = 0.014761018566787243
Validation loss = 0.015514444559812546
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018295826390385628
Validation loss = 0.015398383140563965
Validation loss = 0.013222381472587585
Validation loss = 0.011631381697952747
Validation loss = 0.012294535525143147
Validation loss = 0.012083235196769238
Validation loss = 0.012879843823611736
Validation loss = 0.01463721040636301
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01615365780889988
Validation loss = 0.013724550604820251
Validation loss = 0.012267891317605972
Validation loss = 0.01551870908588171
Validation loss = 0.012230104766786098
Validation loss = 0.01534156035631895
Validation loss = 0.012933951802551746
Validation loss = 0.01103818416595459
Validation loss = 0.011071785353124142
Validation loss = 0.01062436681240797
Validation loss = 0.013350392691791058
Validation loss = 0.012374110519886017
Validation loss = 0.015241994522511959
Validation loss = 0.015536855906248093
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0120855076238513
Validation loss = 0.017425620928406715
Validation loss = 0.01232054177671671
Validation loss = 0.011467658914625645
Validation loss = 0.01231440156698227
Validation loss = 0.012080132961273193
Validation loss = 0.010227089747786522
Validation loss = 0.013971027918159962
Validation loss = 0.011250916868448257
Validation loss = 0.014752195216715336
Validation loss = 0.012353708036243916
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02957683801651001
Validation loss = 0.01416226476430893
Validation loss = 0.010740432888269424
Validation loss = 0.011214238591492176
Validation loss = 0.011301369406282902
Validation loss = 0.011965688318014145
Validation loss = 0.01053630281239748
Validation loss = 0.010685832239687443
Validation loss = 0.011947826482355595
Validation loss = 0.013402259908616543
Validation loss = 0.013301926665008068
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00115  |
| Iteration     | 7         |
| MaximumReturn | -0.000661 |
| MinimumReturn | -0.00152  |
| TotalSamples  | 14994     |
-----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01933472231030464
Validation loss = 0.013041834346950054
Validation loss = 0.012190943583846092
Validation loss = 0.015137771144509315
Validation loss = 0.014385546557605267
Validation loss = 0.014935261569917202
Validation loss = 0.014198752120137215
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017911231145262718
Validation loss = 0.018877800554037094
Validation loss = 0.014758427627384663
Validation loss = 0.01901867613196373
Validation loss = 0.013427662663161755
Validation loss = 0.012787292711436749
Validation loss = 0.014265207573771477
Validation loss = 0.016027195379137993
Validation loss = 0.014114400371909142
Validation loss = 0.01542662549763918
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02041570469737053
Validation loss = 0.012777633965015411
Validation loss = 0.015119473449885845
Validation loss = 0.01344098336994648
Validation loss = 0.013508632779121399
Validation loss = 0.011609411798417568
Validation loss = 0.016497453674674034
Validation loss = 0.017746098339557648
Validation loss = 0.018016135320067406
Validation loss = 0.02310180477797985
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016319142654538155
Validation loss = 0.01732049509882927
Validation loss = 0.014224781654775143
Validation loss = 0.017351651564240456
Validation loss = 0.014085615053772926
Validation loss = 0.011672408320009708
Validation loss = 0.011339791119098663
Validation loss = 0.014663001522421837
Validation loss = 0.014345166273415089
Validation loss = 0.013190415687859058
Validation loss = 0.011483217589557171
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014768158085644245
Validation loss = 0.015678325667977333
Validation loss = 0.013292491436004639
Validation loss = 0.01697736233472824
Validation loss = 0.01840914599597454
Validation loss = 0.014535394497215748
Validation loss = 0.01596217043697834
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000964 |
| Iteration     | 8         |
| MaximumReturn | -0.000774 |
| MinimumReturn | -0.00115  |
| TotalSamples  | 16660     |
-----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01423468254506588
Validation loss = 0.014941820874810219
Validation loss = 0.018737303093075752
Validation loss = 0.018001846969127655
Validation loss = 0.014518508687615395
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015805944800376892
Validation loss = 0.015348125249147415
Validation loss = 0.015400651842355728
Validation loss = 0.01280074194073677
Validation loss = 0.013096483424305916
Validation loss = 0.015436850488185883
Validation loss = 0.013022547587752342
Validation loss = 0.01240675337612629
Validation loss = 0.011557172983884811
Validation loss = 0.01491600926965475
Validation loss = 0.011801229789853096
Validation loss = 0.01180107332766056
Validation loss = 0.01605883240699768
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02366458810865879
Validation loss = 0.014438038691878319
Validation loss = 0.015515152364969254
Validation loss = 0.014309623278677464
Validation loss = 0.013830265030264854
Validation loss = 0.013468481600284576
Validation loss = 0.01880832016468048
Validation loss = 0.01310516707599163
Validation loss = 0.01575252041220665
Validation loss = 0.01284206286072731
Validation loss = 0.015086162835359573
Validation loss = 0.011838489212095737
Validation loss = 0.0150221548974514
Validation loss = 0.013163145631551743
Validation loss = 0.012923178263008595
Validation loss = 0.014802880585193634
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011810052208602428
Validation loss = 0.011587647721171379
Validation loss = 0.011964304372668266
Validation loss = 0.012116661295294762
Validation loss = 0.010889570228755474
Validation loss = 0.018460728228092194
Validation loss = 0.013018720783293247
Validation loss = 0.018576201051473618
Validation loss = 0.019443560391664505
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018194418400526047
Validation loss = 0.01504864264279604
Validation loss = 0.012234936468303204
Validation loss = 0.014243204146623611
Validation loss = 0.018354030326008797
Validation loss = 0.016072953119874
Validation loss = 0.011886144988238811
Validation loss = 0.012467944994568825
Validation loss = 0.01464029960334301
Validation loss = 0.013706903904676437
Validation loss = 0.019603069871664047
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0039  |
| Iteration     | 9        |
| MaximumReturn | -0.00265 |
| MinimumReturn | -0.00575 |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.019924255087971687
Validation loss = 0.014592526480555534
Validation loss = 0.011907165870070457
Validation loss = 0.01332439947873354
Validation loss = 0.013038470409810543
Validation loss = 0.01175596285611391
Validation loss = 0.011555364355444908
Validation loss = 0.011112830601632595
Validation loss = 0.009658875875175
Validation loss = 0.011283203028142452
Validation loss = 0.010671926662325859
Validation loss = 0.013209455646574497
Validation loss = 0.010881191119551659
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01488580834120512
Validation loss = 0.010301075875759125
Validation loss = 0.014505679719150066
Validation loss = 0.015406972728669643
Validation loss = 0.011373056098818779
Validation loss = 0.01141392346471548
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014671163633465767
Validation loss = 0.010646069422364235
Validation loss = 0.011576351709663868
Validation loss = 0.011583066545426846
Validation loss = 0.012004280462861061
Validation loss = 0.010623984038829803
Validation loss = 0.009978257119655609
Validation loss = 0.01007179543375969
Validation loss = 0.011497202329337597
Validation loss = 0.016011007130146027
Validation loss = 0.011963192373514175
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015897128731012344
Validation loss = 0.011145355179905891
Validation loss = 0.009584881365299225
Validation loss = 0.010536749847233295
Validation loss = 0.009171508252620697
Validation loss = 0.01010682713240385
Validation loss = 0.010814215056598186
Validation loss = 0.011980382725596428
Validation loss = 0.013024061918258667
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012299161404371262
Validation loss = 0.014635973609983921
Validation loss = 0.01235042605549097
Validation loss = 0.010959934443235397
Validation loss = 0.010293993167579174
Validation loss = 0.01127011701464653
Validation loss = 0.011087664403021336
Validation loss = 0.014126166701316833
Validation loss = 0.014676387421786785
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00127  |
| Iteration     | 10        |
| MaximumReturn | -0.000605 |
| MinimumReturn | -0.00177  |
| TotalSamples  | 19992     |
-----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011954540386795998
Validation loss = 0.010582055896520615
Validation loss = 0.011702550575137138
Validation loss = 0.01508312113583088
Validation loss = 0.013195444829761982
Validation loss = 0.009586815722286701
Validation loss = 0.009788107126951218
Validation loss = 0.020273122936487198
Validation loss = 0.014691239222884178
Validation loss = 0.013274414464831352
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013400408439338207
Validation loss = 0.014326855540275574
Validation loss = 0.01446976512670517
Validation loss = 0.01846817508339882
Validation loss = 0.014651146717369556
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011717407032847404
Validation loss = 0.009357670322060585
Validation loss = 0.013710913248360157
Validation loss = 0.010825619101524353
Validation loss = 0.01071894634515047
Validation loss = 0.013114318251609802
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011287200264632702
Validation loss = 0.012205742299556732
Validation loss = 0.01097526028752327
Validation loss = 0.010751849040389061
Validation loss = 0.01122041791677475
Validation loss = 0.009309248998761177
Validation loss = 0.01148057822138071
Validation loss = 0.011380596086382866
Validation loss = 0.009262219071388245
Validation loss = 0.012387173250317574
Validation loss = 0.012027856893837452
Validation loss = 0.008976727724075317
Validation loss = 0.009796889498829842
Validation loss = 0.009441356174647808
Validation loss = 0.0104445805773139
Validation loss = 0.014249669387936592
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01633257046341896
Validation loss = 0.016647987067699432
Validation loss = 0.011702919378876686
Validation loss = 0.01479317992925644
Validation loss = 0.015522669069468975
Validation loss = 0.012765014544129372
Validation loss = 0.01263688039034605
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000954 |
| Iteration     | 11        |
| MaximumReturn | -0.00063  |
| MinimumReturn | -0.00145  |
| TotalSamples  | 21658     |
-----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012068457901477814
Validation loss = 0.016080860048532486
Validation loss = 0.01567239500582218
Validation loss = 0.011752239428460598
Validation loss = 0.019599180668592453
Validation loss = 0.011451507918536663
Validation loss = 0.012244109064340591
Validation loss = 0.01180213876068592
Validation loss = 0.012981178238987923
Validation loss = 0.014497889205813408
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014715584926307201
Validation loss = 0.019326791167259216
Validation loss = 0.013289213180541992
Validation loss = 0.01329350471496582
Validation loss = 0.012158939614892006
Validation loss = 0.017833780497312546
Validation loss = 0.010836387053132057
Validation loss = 0.011957620270550251
Validation loss = 0.011327520944178104
Validation loss = 0.017186440527439117
Validation loss = 0.012588033452630043
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011941129341721535
Validation loss = 0.01244618184864521
Validation loss = 0.01308484934270382
Validation loss = 0.012492469511926174
Validation loss = 0.012217927724123001
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013915963470935822
Validation loss = 0.015180818736553192
Validation loss = 0.010335505940020084
Validation loss = 0.014103315770626068
Validation loss = 0.011410771869122982
Validation loss = 0.010073100216686726
Validation loss = 0.009789112024009228
Validation loss = 0.009720681235194206
Validation loss = 0.009949347004294395
Validation loss = 0.010570375248789787
Validation loss = 0.017659444361925125
Validation loss = 0.00971725769340992
Validation loss = 0.01095439400523901
Validation loss = 0.012064477428793907
Validation loss = 0.011854465119540691
Validation loss = 0.010987982153892517
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015608532354235649
Validation loss = 0.016145499423146248
Validation loss = 0.01328694075345993
Validation loss = 0.010714875534176826
Validation loss = 0.014254888519644737
Validation loss = 0.011686006560921669
Validation loss = 0.01889863796532154
Validation loss = 0.011107208207249641
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000872 |
| Iteration     | 12        |
| MaximumReturn | -0.00065  |
| MinimumReturn | -0.00123  |
| TotalSamples  | 23324     |
-----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015536049380898476
Validation loss = 0.010440020821988583
Validation loss = 0.0130021246150136
Validation loss = 0.011628109961748123
Validation loss = 0.011976459994912148
Validation loss = 0.012931909412145615
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01265271008014679
Validation loss = 0.012613010592758656
Validation loss = 0.009283086284995079
Validation loss = 0.01635606586933136
Validation loss = 0.012720772996544838
Validation loss = 0.012022238224744797
Validation loss = 0.011413613334298134
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013913050293922424
Validation loss = 0.015720246359705925
Validation loss = 0.011403829790651798
Validation loss = 0.015523577108979225
Validation loss = 0.011968591250479221
Validation loss = 0.0108697060495615
Validation loss = 0.012164562940597534
Validation loss = 0.010247654281556606
Validation loss = 0.010515511967241764
Validation loss = 0.012819673866033554
Validation loss = 0.010893781669437885
Validation loss = 0.014560374431312084
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008864611387252808
Validation loss = 0.010690740309655666
Validation loss = 0.011130771599709988
Validation loss = 0.009611769579350948
Validation loss = 0.008961894549429417
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011448525823652744
Validation loss = 0.012523320503532887
Validation loss = 0.015607045032083988
Validation loss = 0.012699088081717491
Validation loss = 0.01799619570374489
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0077  |
| Iteration     | 13       |
| MaximumReturn | -0.00527 |
| MinimumReturn | -0.0101  |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012200772762298584
Validation loss = 0.010304574854671955
Validation loss = 0.010240817442536354
Validation loss = 0.011462668888270855
Validation loss = 0.012689444236457348
Validation loss = 0.010164068080484867
Validation loss = 0.012426745146512985
Validation loss = 0.010648953728377819
Validation loss = 0.01369316503405571
Validation loss = 0.011557698249816895
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01214255765080452
Validation loss = 0.01404840499162674
Validation loss = 0.013309963978827
Validation loss = 0.01402340829372406
Validation loss = 0.01136819925159216
Validation loss = 0.013420366682112217
Validation loss = 0.012532070279121399
Validation loss = 0.009937070310115814
Validation loss = 0.010259537026286125
Validation loss = 0.011429316364228725
Validation loss = 0.013139764778316021
Validation loss = 0.010181533172726631
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01303431298583746
Validation loss = 0.013047697953879833
Validation loss = 0.00970169622451067
Validation loss = 0.009621357545256615
Validation loss = 0.009028076194226742
Validation loss = 0.011793565936386585
Validation loss = 0.01572406105697155
Validation loss = 0.011786223389208317
Validation loss = 0.011574392206966877
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011540508829057217
Validation loss = 0.01129536610096693
Validation loss = 0.009924964979290962
Validation loss = 0.008910278789699078
Validation loss = 0.008840344846248627
Validation loss = 0.008446662686765194
Validation loss = 0.012685484252870083
Validation loss = 0.011293653398752213
Validation loss = 0.01168958842754364
Validation loss = 0.011042461730539799
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014872550033032894
Validation loss = 0.013710737228393555
Validation loss = 0.01536965649574995
Validation loss = 0.013722776435315609
Validation loss = 0.01100108865648508
Validation loss = 0.012476298026740551
Validation loss = 0.011393263936042786
Validation loss = 0.010630193166434765
Validation loss = 0.013337735086679459
Validation loss = 0.013989496976137161
Validation loss = 0.011680427007377148
Validation loss = 0.01170172169804573
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00425 |
| Iteration     | 14       |
| MaximumReturn | -0.00205 |
| MinimumReturn | -0.0079  |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014690662734210491
Validation loss = 0.010877544991672039
Validation loss = 0.012727048248052597
Validation loss = 0.011563271284103394
Validation loss = 0.01153375394642353
Validation loss = 0.011207572184503078
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012649482116103172
Validation loss = 0.010460952296853065
Validation loss = 0.009239018894731998
Validation loss = 0.009849874302744865
Validation loss = 0.009210232645273209
Validation loss = 0.009392007254064083
Validation loss = 0.018245169892907143
Validation loss = 0.012323589064180851
Validation loss = 0.011430873535573483
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01029231771826744
Validation loss = 0.009817395359277725
Validation loss = 0.010917420499026775
Validation loss = 0.010219900868833065
Validation loss = 0.01053170021623373
Validation loss = 0.010120751336216927
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012616594322025776
Validation loss = 0.008598593063652515
Validation loss = 0.008842002600431442
Validation loss = 0.007907629944384098
Validation loss = 0.008176691830158234
Validation loss = 0.012543504126369953
Validation loss = 0.01660362258553505
Validation loss = 0.011555722914636135
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013954713009297848
Validation loss = 0.012540341354906559
Validation loss = 0.012613069266080856
Validation loss = 0.011285481974482536
Validation loss = 0.01087227277457714
Validation loss = 0.010926232673227787
Validation loss = 0.010994549840688705
Validation loss = 0.01072756852954626
Validation loss = 0.010873058810830116
Validation loss = 0.012227793224155903
Validation loss = 0.015292022377252579
Validation loss = 0.013150165788829327
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00852 |
| Iteration     | 15       |
| MaximumReturn | -0.00539 |
| MinimumReturn | -0.0111  |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00939861685037613
Validation loss = 0.009183456189930439
Validation loss = 0.008650162257254124
Validation loss = 0.009739793837070465
Validation loss = 0.010377989150583744
Validation loss = 0.00994241051375866
Validation loss = 0.00851032417267561
Validation loss = 0.012220099568367004
Validation loss = 0.011104519478976727
Validation loss = 0.009316629730165005
Validation loss = 0.00985606200993061
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011289350688457489
Validation loss = 0.010302354581654072
Validation loss = 0.010083463974297047
Validation loss = 0.009159686975181103
Validation loss = 0.009295050986111164
Validation loss = 0.009225401096045971
Validation loss = 0.011928855441510677
Validation loss = 0.011649471707642078
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009187891148030758
Validation loss = 0.013533045537769794
Validation loss = 0.009814873337745667
Validation loss = 0.011549620889127254
Validation loss = 0.010764332488179207
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012478585354983807
Validation loss = 0.009732561185956001
Validation loss = 0.008340288884937763
Validation loss = 0.00857632327824831
Validation loss = 0.007834662683308125
Validation loss = 0.010216409340500832
Validation loss = 0.009155558422207832
Validation loss = 0.0088088009506464
Validation loss = 0.012422271072864532
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010397091507911682
Validation loss = 0.011095971800386906
Validation loss = 0.008778181858360767
Validation loss = 0.01015430223196745
Validation loss = 0.010328850708901882
Validation loss = 0.011459321714937687
Validation loss = 0.01703728176653385
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00118  |
| Iteration     | 16        |
| MaximumReturn | -0.000847 |
| MinimumReturn | -0.00166  |
| TotalSamples  | 29988     |
-----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011979206465184689
Validation loss = 0.009730225428938866
Validation loss = 0.010537447407841682
Validation loss = 0.010178091935813427
Validation loss = 0.010676845908164978
Validation loss = 0.011478695087134838
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010982689447700977
Validation loss = 0.010633209720253944
Validation loss = 0.010170025750994682
Validation loss = 0.010015848092734814
Validation loss = 0.011332320049405098
Validation loss = 0.009430455975234509
Validation loss = 0.010628442279994488
Validation loss = 0.009493370540440083
Validation loss = 0.009037052281200886
Validation loss = 0.009437819011509418
Validation loss = 0.011381803080439568
Validation loss = 0.009998085908591747
Validation loss = 0.00896372552961111
Validation loss = 0.008455110713839531
Validation loss = 0.010576223954558372
Validation loss = 0.008519632741808891
Validation loss = 0.010464056394994259
Validation loss = 0.010989796370267868
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010484091006219387
Validation loss = 0.011823837645351887
Validation loss = 0.012270993553102016
Validation loss = 0.011028661392629147
Validation loss = 0.011836016550660133
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013509217649698257
Validation loss = 0.010196036659181118
Validation loss = 0.009123419411480427
Validation loss = 0.008404353633522987
Validation loss = 0.009606003761291504
Validation loss = 0.011550542898476124
Validation loss = 0.009487631730735302
Validation loss = 0.01179368607699871
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012556358240544796
Validation loss = 0.010167926549911499
Validation loss = 0.011279920116066933
Validation loss = 0.009667656384408474
Validation loss = 0.01011268887668848
Validation loss = 0.009452837519347668
Validation loss = 0.010204126127064228
Validation loss = 0.011135010980069637
Validation loss = 0.010913411155343056
Validation loss = 0.009885790757834911
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000877 |
| Iteration     | 17        |
| MaximumReturn | -0.000613 |
| MinimumReturn | -0.00126  |
| TotalSamples  | 31654     |
-----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010878590866923332
Validation loss = 0.009476550854742527
Validation loss = 0.010048719123005867
Validation loss = 0.013193050399422646
Validation loss = 0.01676359213888645
Validation loss = 0.010609650053083897
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010469292290508747
Validation loss = 0.012206384912133217
Validation loss = 0.011888916604220867
Validation loss = 0.010343000292778015
Validation loss = 0.008868942968547344
Validation loss = 0.009737217798829079
Validation loss = 0.01039577554911375
Validation loss = 0.010000635869801044
Validation loss = 0.00869408156722784
Validation loss = 0.013171140104532242
Validation loss = 0.009658937342464924
Validation loss = 0.009501214139163494
Validation loss = 0.00951185543090105
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010525684803724289
Validation loss = 0.014467883855104446
Validation loss = 0.014740281738340855
Validation loss = 0.013881400227546692
Validation loss = 0.011348588392138481
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011355982162058353
Validation loss = 0.010009785182774067
Validation loss = 0.010109378956258297
Validation loss = 0.009666157886385918
Validation loss = 0.009546083398163319
Validation loss = 0.01447997335344553
Validation loss = 0.010594961233437061
Validation loss = 0.010453123599290848
Validation loss = 0.010298808105289936
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012278628535568714
Validation loss = 0.009906915947794914
Validation loss = 0.01213880442082882
Validation loss = 0.009840707294642925
Validation loss = 0.009923530742526054
Validation loss = 0.009174649603664875
Validation loss = 0.009408017620444298
Validation loss = 0.008757075294852257
Validation loss = 0.01048350427299738
Validation loss = 0.014289811253547668
Validation loss = 0.012556521221995354
Validation loss = 0.010414832271635532
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00123  |
| Iteration     | 18        |
| MaximumReturn | -0.000854 |
| MinimumReturn | -0.00192  |
| TotalSamples  | 33320     |
-----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01047404482960701
Validation loss = 0.009950630366802216
Validation loss = 0.010717838071286678
Validation loss = 0.008845534175634384
Validation loss = 0.009160300716757774
Validation loss = 0.010241788811981678
Validation loss = 0.011729898862540722
Validation loss = 0.01080270204693079
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00879935547709465
Validation loss = 0.01078292727470398
Validation loss = 0.012155044823884964
Validation loss = 0.010213633999228477
Validation loss = 0.013765937648713589
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013256284408271313
Validation loss = 0.012528845109045506
Validation loss = 0.01161916833370924
Validation loss = 0.009087051264941692
Validation loss = 0.009889425709843636
Validation loss = 0.010078473016619682
Validation loss = 0.011782914400100708
Validation loss = 0.010522018186748028
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010037128813564777
Validation loss = 0.008875439874827862
Validation loss = 0.00881753209978342
Validation loss = 0.00858460646122694
Validation loss = 0.00946809258311987
Validation loss = 0.01152054127305746
Validation loss = 0.009132538922131062
Validation loss = 0.011593326926231384
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012947618961334229
Validation loss = 0.012806419283151627
Validation loss = 0.011365482583642006
Validation loss = 0.010693792253732681
Validation loss = 0.008878089487552643
Validation loss = 0.010257687419652939
Validation loss = 0.009025586768984795
Validation loss = 0.009746895171701908
Validation loss = 0.015371851623058319
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000954 |
| Iteration     | 19        |
| MaximumReturn | -0.000658 |
| MinimumReturn | -0.00205  |
| TotalSamples  | 34986     |
-----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013219383545219898
Validation loss = 0.010793081484735012
Validation loss = 0.009714559651911259
Validation loss = 0.008995503187179565
Validation loss = 0.009059564210474491
Validation loss = 0.010012083686888218
Validation loss = 0.010161349549889565
Validation loss = 0.009315903298556805
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010693185031414032
Validation loss = 0.012305837124586105
Validation loss = 0.010139472782611847
Validation loss = 0.009549419395625591
Validation loss = 0.008930114097893238
Validation loss = 0.010070394724607468
Validation loss = 0.010553927160799503
Validation loss = 0.00983042735606432
Validation loss = 0.010496160946786404
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011469556950032711
Validation loss = 0.01387510634958744
Validation loss = 0.018618706613779068
Validation loss = 0.01164986565709114
Validation loss = 0.011463002301752567
Validation loss = 0.01045581977814436
Validation loss = 0.011451863683760166
Validation loss = 0.011446830816566944
Validation loss = 0.01074936706572771
Validation loss = 0.010055671446025372
Validation loss = 0.012361964210867882
Validation loss = 0.01180302258580923
Validation loss = 0.010700874030590057
Validation loss = 0.013009123504161835
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010126390494406223
Validation loss = 0.008890406228601933
Validation loss = 0.00868423655629158
Validation loss = 0.008661617524921894
Validation loss = 0.00974453054368496
Validation loss = 0.00971754640340805
Validation loss = 0.009295224212110043
Validation loss = 0.008749349974095821
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01218272466212511
Validation loss = 0.009869307279586792
Validation loss = 0.012905752286314964
Validation loss = 0.010845878161489964
Validation loss = 0.014844975434243679
Validation loss = 0.009948010556399822
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0009   |
| Iteration     | 20        |
| MaximumReturn | -0.000667 |
| MinimumReturn | -0.00123  |
| TotalSamples  | 36652     |
-----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010225079953670502
Validation loss = 0.010725108906626701
Validation loss = 0.012751700356602669
Validation loss = 0.010120850056409836
Validation loss = 0.012491242028772831
Validation loss = 0.010051601566374302
Validation loss = 0.009987304918467999
Validation loss = 0.010046335868537426
Validation loss = 0.01288896705955267
Validation loss = 0.010144118219614029
Validation loss = 0.0101887546479702
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01109065767377615
Validation loss = 0.010059509426355362
Validation loss = 0.009433398023247719
Validation loss = 0.009028468281030655
Validation loss = 0.009620298631489277
Validation loss = 0.010740946047008038
Validation loss = 0.010800899006426334
Validation loss = 0.012113573960959911
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012889870442450047
Validation loss = 0.010935304686427116
Validation loss = 0.01124672032892704
Validation loss = 0.010866251774132252
Validation loss = 0.010674425400793552
Validation loss = 0.011315383948385715
Validation loss = 0.009502526372671127
Validation loss = 0.010231945663690567
Validation loss = 0.012837156653404236
Validation loss = 0.010676962323486805
Validation loss = 0.01055907178670168
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009421763941645622
Validation loss = 0.011587882414460182
Validation loss = 0.012547639198601246
Validation loss = 0.013577375560998917
Validation loss = 0.008808992803096771
Validation loss = 0.008038404397666454
Validation loss = 0.008569923229515553
Validation loss = 0.013219672255218029
Validation loss = 0.013150567188858986
Validation loss = 0.00865116249769926
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010265681892633438
Validation loss = 0.010308614000678062
Validation loss = 0.009159790351986885
Validation loss = 0.009538542479276657
Validation loss = 0.009838856756687164
Validation loss = 0.013023460283875465
Validation loss = 0.009909320622682571
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00577 |
| Iteration     | 21       |
| MaximumReturn | -0.00361 |
| MinimumReturn | -0.009   |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009274713695049286
Validation loss = 0.009730888530611992
Validation loss = 0.011072102002799511
Validation loss = 0.010385735891759396
Validation loss = 0.00893542543053627
Validation loss = 0.012737932614982128
Validation loss = 0.009909235872328281
Validation loss = 0.010243242606520653
Validation loss = 0.009649825282394886
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010920596309006214
Validation loss = 0.011175921186804771
Validation loss = 0.009220856241881847
Validation loss = 0.015434306114912033
Validation loss = 0.00888849701732397
Validation loss = 0.008976496756076813
Validation loss = 0.008532273583114147
Validation loss = 0.008611106313765049
Validation loss = 0.011455771513283253
Validation loss = 0.010586081072688103
Validation loss = 0.009587916545569897
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013171817176043987
Validation loss = 0.011678587645292282
Validation loss = 0.01083325780928135
Validation loss = 0.011765385046601295
Validation loss = 0.009117750450968742
Validation loss = 0.009348767809569836
Validation loss = 0.009663394652307034
Validation loss = 0.011790752410888672
Validation loss = 0.01083310041576624
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008669814094901085
Validation loss = 0.007790547329932451
Validation loss = 0.008156517520546913
Validation loss = 0.007711479440331459
Validation loss = 0.008618069812655449
Validation loss = 0.013416432775557041
Validation loss = 0.010176637209951878
Validation loss = 0.00923824030905962
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009977702051401138
Validation loss = 0.010177969932556152
Validation loss = 0.00875718891620636
Validation loss = 0.008937510661780834
Validation loss = 0.010293612256646156
Validation loss = 0.00835410412400961
Validation loss = 0.013860390521585941
Validation loss = 0.011712553910911083
Validation loss = 0.01050933450460434
Validation loss = 0.009677662514150143
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0114  |
| Iteration     | 22       |
| MaximumReturn | -0.00888 |
| MinimumReturn | -0.0176  |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008558591827750206
Validation loss = 0.008416673168540001
Validation loss = 0.008499226532876492
Validation loss = 0.00825265608727932
Validation loss = 0.009636548347771168
Validation loss = 0.012762358412146568
Validation loss = 0.00988761242479086
Validation loss = 0.019309578463435173
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00892648845911026
Validation loss = 0.008768226020038128
Validation loss = 0.008705059066414833
Validation loss = 0.00841498002409935
Validation loss = 0.007776507176458836
Validation loss = 0.017732495442032814
Validation loss = 0.009754596278071404
Validation loss = 0.007550888694822788
Validation loss = 0.008343261666595936
Validation loss = 0.009166410192847252
Validation loss = 0.008504673838615417
Validation loss = 0.009288104251027107
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009816361591219902
Validation loss = 0.009518375620245934
Validation loss = 0.017309343442320824
Validation loss = 0.014478310942649841
Validation loss = 0.011386826634407043
Validation loss = 0.009902805089950562
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008900020271539688
Validation loss = 0.008813953958451748
Validation loss = 0.008860486559569836
Validation loss = 0.009079082868993282
Validation loss = 0.00808493047952652
Validation loss = 0.008541351184248924
Validation loss = 0.007891446352005005
Validation loss = 0.008341221138834953
Validation loss = 0.008843136951327324
Validation loss = 0.00871289148926735
Validation loss = 0.009526381269097328
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009032433852553368
Validation loss = 0.009790799580514431
Validation loss = 0.010496001690626144
Validation loss = 0.009328234940767288
Validation loss = 0.00987811479717493
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00981 |
| Iteration     | 23       |
| MaximumReturn | -0.00702 |
| MinimumReturn | -0.0164  |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010565394535660744
Validation loss = 0.008622058667242527
Validation loss = 0.00857201311737299
Validation loss = 0.008250957354903221
Validation loss = 0.009066612459719181
Validation loss = 0.008922601118683815
Validation loss = 0.00842153001576662
Validation loss = 0.008372033014893532
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007712464779615402
Validation loss = 0.0123408492654562
Validation loss = 0.008724620565772057
Validation loss = 0.01019459217786789
Validation loss = 0.009996185079216957
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009073769673705101
Validation loss = 0.010483184829354286
Validation loss = 0.009520168416202068
Validation loss = 0.01198126096278429
Validation loss = 0.008915608748793602
Validation loss = 0.012303068302571774
Validation loss = 0.010872332379221916
Validation loss = 0.010234696790575981
Validation loss = 0.010578719899058342
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008692201226949692
Validation loss = 0.008166225627064705
Validation loss = 0.008671533316373825
Validation loss = 0.008442455902695656
Validation loss = 0.008450107648968697
Validation loss = 0.008736592717468739
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010165895335376263
Validation loss = 0.010060850530862808
Validation loss = 0.009512199088931084
Validation loss = 0.014362229034304619
Validation loss = 0.01102311722934246
Validation loss = 0.011875903233885765
Validation loss = 0.009481986053287983
Validation loss = 0.00967942364513874
Validation loss = 0.009358094073832035
Validation loss = 0.009288915432989597
Validation loss = 0.008394075557589531
Validation loss = 0.008453668095171452
Validation loss = 0.009681783616542816
Validation loss = 0.010838866233825684
Validation loss = 0.007816415280103683
Validation loss = 0.00817143265157938
Validation loss = 0.008435665629804134
Validation loss = 0.008179775439202785
Validation loss = 0.00865108985453844
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00106  |
| Iteration     | 24        |
| MaximumReturn | -0.000581 |
| MinimumReturn | -0.00256  |
| TotalSamples  | 43316     |
-----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007861865684390068
Validation loss = 0.014576234854757786
Validation loss = 0.009584051556885242
Validation loss = 0.009333389811217785
Validation loss = 0.008792497217655182
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009668768383562565
Validation loss = 0.0082401093095541
Validation loss = 0.007500510197132826
Validation loss = 0.008290735073387623
Validation loss = 0.007956672459840775
Validation loss = 0.011594345793128014
Validation loss = 0.009472840465605259
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010447575710713863
Validation loss = 0.014891277998685837
Validation loss = 0.010807452723383904
Validation loss = 0.011147934012115002
Validation loss = 0.012071048840880394
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008492754772305489
Validation loss = 0.008659412153065205
Validation loss = 0.008276032283902168
Validation loss = 0.008585055358707905
Validation loss = 0.009659291245043278
Validation loss = 0.011043933220207691
Validation loss = 0.009832431562244892
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00927484966814518
Validation loss = 0.009403126314282417
Validation loss = 0.008589910343289375
Validation loss = 0.009600301273167133
Validation loss = 0.008136927150189877
Validation loss = 0.010334049351513386
Validation loss = 0.009546642191708088
Validation loss = 0.007424249313771725
Validation loss = 0.00971553847193718
Validation loss = 0.008014066144824028
Validation loss = 0.008267301134765148
Validation loss = 0.008329899050295353
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000995 |
| Iteration     | 25        |
| MaximumReturn | -0.000686 |
| MinimumReturn | -0.00149  |
| TotalSamples  | 44982     |
-----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009770854376256466
Validation loss = 0.009293845854699612
Validation loss = 0.011514094658195972
Validation loss = 0.010299121029675007
Validation loss = 0.009310654364526272
Validation loss = 0.008654027245938778
Validation loss = 0.00895343441516161
Validation loss = 0.008696758188307285
Validation loss = 0.010135803371667862
Validation loss = 0.010335852392017841
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009334869682788849
Validation loss = 0.008890003897249699
Validation loss = 0.008961301296949387
Validation loss = 0.00981878861784935
Validation loss = 0.008591278456151485
Validation loss = 0.0087251802906394
Validation loss = 0.015706030651926994
Validation loss = 0.009318236261606216
Validation loss = 0.007781553082168102
Validation loss = 0.00861485954374075
Validation loss = 0.008893031626939774
Validation loss = 0.011461586691439152
Validation loss = 0.011617491021752357
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010871096514165401
Validation loss = 0.009545357897877693
Validation loss = 0.01081080175936222
Validation loss = 0.009162127040326595
Validation loss = 0.009882498532533646
Validation loss = 0.010365133173763752
Validation loss = 0.009455215185880661
Validation loss = 0.01057132426649332
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009564371779561043
Validation loss = 0.009262993931770325
Validation loss = 0.008935828693211079
Validation loss = 0.00947663839906454
Validation loss = 0.009835167787969112
Validation loss = 0.011136082001030445
Validation loss = 0.008193064481019974
Validation loss = 0.007840766571462154
Validation loss = 0.007476378697901964
Validation loss = 0.007401890587061644
Validation loss = 0.008098564110696316
Validation loss = 0.00847314577549696
Validation loss = 0.008888130076229572
Validation loss = 0.007913071662187576
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00941338948905468
Validation loss = 0.008833610452711582
Validation loss = 0.009576652199029922
Validation loss = 0.008196392096579075
Validation loss = 0.007781165651977062
Validation loss = 0.00866855587810278
Validation loss = 0.008178473450243473
Validation loss = 0.009237135760486126
Validation loss = 0.009067820385098457
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000859 |
| Iteration     | 26        |
| MaximumReturn | -0.000659 |
| MinimumReturn | -0.0012   |
| TotalSamples  | 46648     |
-----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009922505356371403
Validation loss = 0.01279220636934042
Validation loss = 0.008694985881447792
Validation loss = 0.008830295875668526
Validation loss = 0.009570094756782055
Validation loss = 0.009610248729586601
Validation loss = 0.010334520600736141
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011654314585030079
Validation loss = 0.009144527837634087
Validation loss = 0.011907937936484814
Validation loss = 0.009942659176886082
Validation loss = 0.009120991453528404
Validation loss = 0.009273621253669262
Validation loss = 0.009628228843212128
Validation loss = 0.013388969004154205
Validation loss = 0.008997469209134579
Validation loss = 0.009137350134551525
Validation loss = 0.009420144371688366
Validation loss = 0.013069722801446915
Validation loss = 0.009322117082774639
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009986335411667824
Validation loss = 0.012766057625412941
Validation loss = 0.013737265020608902
Validation loss = 0.013843387365341187
Validation loss = 0.0110751548781991
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00796775333583355
Validation loss = 0.010552259162068367
Validation loss = 0.009095706045627594
Validation loss = 0.00866289995610714
Validation loss = 0.008555853739380836
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008561558090150356
Validation loss = 0.008544805459678173
Validation loss = 0.010053878650069237
Validation loss = 0.009837046265602112
Validation loss = 0.007936119101941586
Validation loss = 0.008358320221304893
Validation loss = 0.008108235895633698
Validation loss = 0.00884528923779726
Validation loss = 0.008589043281972408
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00215 |
| Iteration     | 27       |
| MaximumReturn | -0.00131 |
| MinimumReturn | -0.00382 |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009241512976586819
Validation loss = 0.009284001775085926
Validation loss = 0.00951447430998087
Validation loss = 0.008702748455107212
Validation loss = 0.008730066940188408
Validation loss = 0.008699934929609299
Validation loss = 0.009352686814963818
Validation loss = 0.00897165760397911
Validation loss = 0.009711109101772308
Validation loss = 0.008718802593648434
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009594070725142956
Validation loss = 0.00913670938462019
Validation loss = 0.008642700500786304
Validation loss = 0.00954299233853817
Validation loss = 0.008807764388620853
Validation loss = 0.012387607246637344
Validation loss = 0.009069596417248249
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009172572754323483
Validation loss = 0.010116427205502987
Validation loss = 0.008923662826418877
Validation loss = 0.012915282510221004
Validation loss = 0.009703055955469608
Validation loss = 0.010366082191467285
Validation loss = 0.0103842131793499
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00874109473079443
Validation loss = 0.00864205788820982
Validation loss = 0.007817942649126053
Validation loss = 0.009374549612402916
Validation loss = 0.008571562357246876
Validation loss = 0.0077872988767921925
Validation loss = 0.007703365292400122
Validation loss = 0.008402279578149319
Validation loss = 0.009533132426440716
Validation loss = 0.008454293943941593
Validation loss = 0.008277651853859425
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008517653681337833
Validation loss = 0.008281168527901173
Validation loss = 0.011056446470320225
Validation loss = 0.008455385453999043
Validation loss = 0.01188235729932785
Validation loss = 0.009140297770500183
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00103  |
| Iteration     | 28        |
| MaximumReturn | -0.000642 |
| MinimumReturn | -0.00376  |
| TotalSamples  | 49980     |
-----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008869985118508339
Validation loss = 0.009696081280708313
Validation loss = 0.011630872264504433
Validation loss = 0.008738085627555847
Validation loss = 0.009600915014743805
Validation loss = 0.008967440575361252
Validation loss = 0.013365012593567371
Validation loss = 0.010827774181962013
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009383453987538815
Validation loss = 0.009026129730045795
Validation loss = 0.008777459152042866
Validation loss = 0.010621624067425728
Validation loss = 0.009040533564984798
Validation loss = 0.008967221714556217
Validation loss = 0.00913164857774973
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009603749960660934
Validation loss = 0.009453067556023598
Validation loss = 0.00958587322384119
Validation loss = 0.009380590170621872
Validation loss = 0.009638821706175804
Validation loss = 0.009061482734978199
Validation loss = 0.009764842689037323
Validation loss = 0.009431712329387665
Validation loss = 0.009444905444979668
Validation loss = 0.008772904053330421
Validation loss = 0.010957872495055199
Validation loss = 0.009396604262292385
Validation loss = 0.009711113758385181
Validation loss = 0.010640650056302547
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008128629066050053
Validation loss = 0.010590530931949615
Validation loss = 0.008366052061319351
Validation loss = 0.008118479512631893
Validation loss = 0.008139832876622677
Validation loss = 0.008365001529455185
Validation loss = 0.0086720772087574
Validation loss = 0.008573022671043873
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00900216307491064
Validation loss = 0.00752034317702055
Validation loss = 0.007555684074759483
Validation loss = 0.008032306097447872
Validation loss = 0.008008998818695545
Validation loss = 0.007987989112734795
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000931 |
| Iteration     | 29        |
| MaximumReturn | -0.000623 |
| MinimumReturn | -0.00126  |
| TotalSamples  | 51646     |
-----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010514795780181885
Validation loss = 0.009635570459067822
Validation loss = 0.008788593113422394
Validation loss = 0.008780639618635178
Validation loss = 0.010116256773471832
Validation loss = 0.008711987175047398
Validation loss = 0.01328944694250822
Validation loss = 0.008707272820174694
Validation loss = 0.008913625031709671
Validation loss = 0.009630512446165085
Validation loss = 0.00920185074210167
Validation loss = 0.008887290954589844
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008426155894994736
Validation loss = 0.008728789165616035
Validation loss = 0.008358130231499672
Validation loss = 0.008712898939847946
Validation loss = 0.011090532876551151
Validation loss = 0.009007925167679787
Validation loss = 0.009208733215928078
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010530824773013592
Validation loss = 0.01112787239253521
Validation loss = 0.012043238617479801
Validation loss = 0.015525570139288902
Validation loss = 0.011705935932695866
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007597131188958883
Validation loss = 0.011194284074008465
Validation loss = 0.008250299841165543
Validation loss = 0.00978961493819952
Validation loss = 0.008867230266332626
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008263343013823032
Validation loss = 0.011384398676455021
Validation loss = 0.008483336307108402
Validation loss = 0.008725931867957115
Validation loss = 0.008414361625909805
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000869 |
| Iteration     | 30        |
| MaximumReturn | -0.000585 |
| MinimumReturn | -0.00153  |
| TotalSamples  | 53312     |
-----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01049974374473095
Validation loss = 0.010008711367845535
Validation loss = 0.012306248769164085
Validation loss = 0.008917933329939842
Validation loss = 0.008213328197598457
Validation loss = 0.009505064226686954
Validation loss = 0.009901639074087143
Validation loss = 0.00948481634259224
Validation loss = 0.008681794628500938
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00913511123508215
Validation loss = 0.009740974754095078
Validation loss = 0.0085284523665905
Validation loss = 0.009943650104105473
Validation loss = 0.009967436082661152
Validation loss = 0.00965963862836361
Validation loss = 0.010373711585998535
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01064795907586813
Validation loss = 0.010487924329936504
Validation loss = 0.010161438956856728
Validation loss = 0.010367080569267273
Validation loss = 0.009644378907978535
Validation loss = 0.011661439202725887
Validation loss = 0.009811339899897575
Validation loss = 0.009695216082036495
Validation loss = 0.009867061860859394
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007305904291570187
Validation loss = 0.009097388945519924
Validation loss = 0.010397111997008324
Validation loss = 0.00837203674018383
Validation loss = 0.008062202483415604
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0093122823163867
Validation loss = 0.00799576099961996
Validation loss = 0.008825927041471004
Validation loss = 0.008041969500482082
Validation loss = 0.00872394535690546
Validation loss = 0.009783810004591942
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00108  |
| Iteration     | 31        |
| MaximumReturn | -0.000833 |
| MinimumReturn | -0.0015   |
| TotalSamples  | 54978     |
-----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00862053968012333
Validation loss = 0.011481611989438534
Validation loss = 0.009685136377811432
Validation loss = 0.009332277812063694
Validation loss = 0.008380476385354996
Validation loss = 0.008788861334323883
Validation loss = 0.007978551089763641
Validation loss = 0.007916920818388462
Validation loss = 0.008622006513178349
Validation loss = 0.007859569042921066
Validation loss = 0.008934094570577145
Validation loss = 0.009493797086179256
Validation loss = 0.010131622664630413
Validation loss = 0.00844654906541109
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010876732878386974
Validation loss = 0.010205925442278385
Validation loss = 0.008771323598921299
Validation loss = 0.009553560055792332
Validation loss = 0.00858943909406662
Validation loss = 0.010323993861675262
Validation loss = 0.009154188446700573
Validation loss = 0.009208661504089832
Validation loss = 0.009086670354008675
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009782941080629826
Validation loss = 0.009954609908163548
Validation loss = 0.009627758525311947
Validation loss = 0.010782607831060886
Validation loss = 0.014228401705622673
Validation loss = 0.009630606509745121
Validation loss = 0.009815202094614506
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007940907031297684
Validation loss = 0.013074823655188084
Validation loss = 0.01003569271415472
Validation loss = 0.007720394060015678
Validation loss = 0.0076135313138365746
Validation loss = 0.007665527518838644
Validation loss = 0.007967966608703136
Validation loss = 0.009207416325807571
Validation loss = 0.010454040952026844
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011002524755895138
Validation loss = 0.011290532536804676
Validation loss = 0.01106838695704937
Validation loss = 0.0087013253942132
Validation loss = 0.010428119450807571
Validation loss = 0.008554060012102127
Validation loss = 0.009486018680036068
Validation loss = 0.00911952368915081
Validation loss = 0.011682850308716297
Validation loss = 0.010221903212368488
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000815 |
| Iteration     | 32        |
| MaximumReturn | -0.000633 |
| MinimumReturn | -0.00142  |
| TotalSamples  | 56644     |
-----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014238123781979084
Validation loss = 0.009174511767923832
Validation loss = 0.010125013999640942
Validation loss = 0.009226580150425434
Validation loss = 0.008368710987269878
Validation loss = 0.008584209717810154
Validation loss = 0.00926433689892292
Validation loss = 0.009034397080540657
Validation loss = 0.008764013648033142
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008455894887447357
Validation loss = 0.014949031174182892
Validation loss = 0.010604985058307648
Validation loss = 0.008277944289147854
Validation loss = 0.00813300721347332
Validation loss = 0.008799494244158268
Validation loss = 0.009614557027816772
Validation loss = 0.008593779988586903
Validation loss = 0.008274762891232967
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010199815034866333
Validation loss = 0.00945193599909544
Validation loss = 0.009053168818354607
Validation loss = 0.009782739914953709
Validation loss = 0.011425821110606194
Validation loss = 0.011291786096990108
Validation loss = 0.009070010855793953
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00962480716407299
Validation loss = 0.00786268524825573
Validation loss = 0.008295825682580471
Validation loss = 0.0083213085308671
Validation loss = 0.009556226432323456
Validation loss = 0.007769464049488306
Validation loss = 0.008532596752047539
Validation loss = 0.008586260490119457
Validation loss = 0.009740839712321758
Validation loss = 0.007955639623105526
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009731657803058624
Validation loss = 0.007936053909361362
Validation loss = 0.009081898257136345
Validation loss = 0.007997730746865273
Validation loss = 0.00883855577558279
Validation loss = 0.009096157737076283
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0237  |
| Iteration     | 33       |
| MaximumReturn | -0.0145  |
| MinimumReturn | -0.0426  |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008413651026785374
Validation loss = 0.00851974543184042
Validation loss = 0.009096315130591393
Validation loss = 0.00933209341019392
Validation loss = 0.009296417236328125
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007860234938561916
Validation loss = 0.008855941705405712
Validation loss = 0.008505321107804775
Validation loss = 0.008236953988671303
Validation loss = 0.00766731658950448
Validation loss = 0.013254029676318169
Validation loss = 0.009237931109964848
Validation loss = 0.009038243442773819
Validation loss = 0.008846892043948174
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010135364718735218
Validation loss = 0.010373876430094242
Validation loss = 0.01039600558578968
Validation loss = 0.009732728824019432
Validation loss = 0.00908567663282156
Validation loss = 0.009905886836349964
Validation loss = 0.012908740900456905
Validation loss = 0.012355165556073189
Validation loss = 0.009382524527609348
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01216553058475256
Validation loss = 0.010243808850646019
Validation loss = 0.008726578205823898
Validation loss = 0.008191940374672413
Validation loss = 0.008792925626039505
Validation loss = 0.008046683855354786
Validation loss = 0.008359039202332497
Validation loss = 0.00855918787419796
Validation loss = 0.0076869758777320385
Validation loss = 0.007717968430370092
Validation loss = 0.008781985379755497
Validation loss = 0.008168928325176239
Validation loss = 0.008037662133574486
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011183255352079868
Validation loss = 0.00849247258156538
Validation loss = 0.007917564362287521
Validation loss = 0.00806280318647623
Validation loss = 0.00790561456233263
Validation loss = 0.010324321687221527
Validation loss = 0.00786654558032751
Validation loss = 0.008194507099688053
Validation loss = 0.009141700342297554
Validation loss = 0.010164039209485054
Validation loss = 0.010747810825705528
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0011   |
| Iteration     | 34        |
| MaximumReturn | -0.000507 |
| MinimumReturn | -0.002    |
| TotalSamples  | 59976     |
-----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00956464372575283
Validation loss = 0.00947490893304348
Validation loss = 0.0083385668694973
Validation loss = 0.008749672211706638
Validation loss = 0.008223582059144974
Validation loss = 0.009095367044210434
Validation loss = 0.008086089976131916
Validation loss = 0.00973842479288578
Validation loss = 0.0084027498960495
Validation loss = 0.008745941333472729
Validation loss = 0.008795027621090412
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0097991693764925
Validation loss = 0.008562237955629826
Validation loss = 0.009703491814434528
Validation loss = 0.00869537889957428
Validation loss = 0.008491327986121178
Validation loss = 0.008690071292221546
Validation loss = 0.011468833312392235
Validation loss = 0.009953707456588745
Validation loss = 0.009032079018652439
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01006542332470417
Validation loss = 0.009556656703352928
Validation loss = 0.00921920221298933
Validation loss = 0.010711618699133396
Validation loss = 0.01106971874833107
Validation loss = 0.009379105642437935
Validation loss = 0.009525833651423454
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008358133025467396
Validation loss = 0.008467013016343117
Validation loss = 0.007489956449717283
Validation loss = 0.013682885095477104
Validation loss = 0.008099834434688091
Validation loss = 0.007643244694918394
Validation loss = 0.008992668241262436
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011019936762750149
Validation loss = 0.008494165726006031
Validation loss = 0.007987920194864273
Validation loss = 0.00817834585905075
Validation loss = 0.008186724036931992
Validation loss = 0.00963667593896389
Validation loss = 0.008132552728056908
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00195 |
| Iteration     | 35       |
| MaximumReturn | -0.00119 |
| MinimumReturn | -0.00274 |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010244892910122871
Validation loss = 0.008687373250722885
Validation loss = 0.009660826064646244
Validation loss = 0.010861257091164589
Validation loss = 0.00863740500062704
Validation loss = 0.008268782868981361
Validation loss = 0.008342564105987549
Validation loss = 0.020061086863279343
Validation loss = 0.008537020534276962
Validation loss = 0.008760306984186172
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009166891686618328
Validation loss = 0.00983028206974268
Validation loss = 0.011648457497358322
Validation loss = 0.009111561812460423
Validation loss = 0.010939785279333591
Validation loss = 0.008987879380583763
Validation loss = 0.00886685959994793
Validation loss = 0.009491246193647385
Validation loss = 0.008862060494720936
Validation loss = 0.008571596816182137
Validation loss = 0.015054209157824516
Validation loss = 0.009316401556134224
Validation loss = 0.008507591672241688
Validation loss = 0.008260768838226795
Validation loss = 0.009332419373095036
Validation loss = 0.008626578375697136
Validation loss = 0.0074189770966768265
Validation loss = 0.008513807319104671
Validation loss = 0.010345094837248325
Validation loss = 0.008234042674303055
Validation loss = 0.008660427294671535
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01009793858975172
Validation loss = 0.01017023716121912
Validation loss = 0.011013281531631947
Validation loss = 0.009084822610020638
Validation loss = 0.00997661892324686
Validation loss = 0.009310993365943432
Validation loss = 0.009254301898181438
Validation loss = 0.010625959374010563
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008319953456521034
Validation loss = 0.007445424795150757
Validation loss = 0.008520320057868958
Validation loss = 0.007716173306107521
Validation loss = 0.007913937792181969
Validation loss = 0.008031596429646015
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008156286552548409
Validation loss = 0.008550053462386131
Validation loss = 0.00819671805948019
Validation loss = 0.009065731428563595
Validation loss = 0.00890394113957882
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000946 |
| Iteration     | 36        |
| MaximumReturn | -0.000744 |
| MinimumReturn | -0.00134  |
| TotalSamples  | 63308     |
-----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0086276950314641
Validation loss = 0.008495387621223927
Validation loss = 0.009241177700459957
Validation loss = 0.008801157586276531
Validation loss = 0.009575877338647842
Validation loss = 0.008421780541539192
Validation loss = 0.010231207124888897
Validation loss = 0.008746328763663769
Validation loss = 0.00869823805987835
Validation loss = 0.009114406071603298
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007791962940245867
Validation loss = 0.007974719628691673
Validation loss = 0.00946560688316822
Validation loss = 0.009259683080017567
Validation loss = 0.009456368163228035
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010495911352336407
Validation loss = 0.009825176559388638
Validation loss = 0.009497230872511864
Validation loss = 0.00957508571445942
Validation loss = 0.010204073041677475
Validation loss = 0.012166153639554977
Validation loss = 0.010006397031247616
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01044305320829153
Validation loss = 0.009328077547252178
Validation loss = 0.00778645183891058
Validation loss = 0.007614118978381157
Validation loss = 0.007773601915687323
Validation loss = 0.008646857924759388
Validation loss = 0.008385207504034042
Validation loss = 0.007523579988628626
Validation loss = 0.009012638591229916
Validation loss = 0.00900881178677082
Validation loss = 0.008078854531049728
Validation loss = 0.011786576360464096
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008301026187837124
Validation loss = 0.008178371004760265
Validation loss = 0.008089739829301834
Validation loss = 0.008628414943814278
Validation loss = 0.013033256866037846
Validation loss = 0.010163667611777782
Validation loss = 0.008044595830142498
Validation loss = 0.008298832923173904
Validation loss = 0.008795540779829025
Validation loss = 0.008517617359757423
Validation loss = 0.008179157972335815
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00099  |
| Iteration     | 37        |
| MaximumReturn | -0.000564 |
| MinimumReturn | -0.00181  |
| TotalSamples  | 64974     |
-----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008610603399574757
Validation loss = 0.01039617508649826
Validation loss = 0.009427720680832863
Validation loss = 0.00843837484717369
Validation loss = 0.00818631611764431
Validation loss = 0.009452048689126968
Validation loss = 0.008741658180952072
Validation loss = 0.008385824039578438
Validation loss = 0.008101167157292366
Validation loss = 0.008661964908242226
Validation loss = 0.00809322390705347
Validation loss = 0.012990616261959076
Validation loss = 0.008246141485869884
Validation loss = 0.007981406524777412
Validation loss = 0.008573106490075588
Validation loss = 0.011461582034826279
Validation loss = 0.008722441270947456
Validation loss = 0.00857815332710743
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00902134831994772
Validation loss = 0.009876959025859833
Validation loss = 0.00947450753301382
Validation loss = 0.009873411618173122
Validation loss = 0.008838067762553692
Validation loss = 0.00931211095303297
Validation loss = 0.009335493668913841
Validation loss = 0.009079070761799812
Validation loss = 0.008552856743335724
Validation loss = 0.010155508294701576
Validation loss = 0.009947968646883965
Validation loss = 0.009745776653289795
Validation loss = 0.008987109176814556
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010298061184585094
Validation loss = 0.009340171702206135
Validation loss = 0.009254802018404007
Validation loss = 0.00941470731049776
Validation loss = 0.010571420192718506
Validation loss = 0.009309830144047737
Validation loss = 0.011490633711218834
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010284950956702232
Validation loss = 0.00897492840886116
Validation loss = 0.008332356810569763
Validation loss = 0.00810613390058279
Validation loss = 0.008677363395690918
Validation loss = 0.007250736467540264
Validation loss = 0.01355133205652237
Validation loss = 0.009294463321566582
Validation loss = 0.008892105892300606
Validation loss = 0.008201452903449535
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01065191999077797
Validation loss = 0.01116966363042593
Validation loss = 0.009754706174135208
Validation loss = 0.008535844273865223
Validation loss = 0.009013169445097446
Validation loss = 0.010428178124129772
Validation loss = 0.009673872962594032
Validation loss = 0.008895497769117355
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000903 |
| Iteration     | 38        |
| MaximumReturn | -0.000629 |
| MinimumReturn | -0.00142  |
| TotalSamples  | 66640     |
-----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00873768050223589
Validation loss = 0.014578543603420258
Validation loss = 0.008420959115028381
Validation loss = 0.008705735206604004
Validation loss = 0.008430722169578075
Validation loss = 0.00805050227791071
Validation loss = 0.008199959993362427
Validation loss = 0.008081319741904736
Validation loss = 0.010041479952633381
Validation loss = 0.009272309020161629
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009102599695324898
Validation loss = 0.011325434781610966
Validation loss = 0.009001878090202808
Validation loss = 0.008646451868116856
Validation loss = 0.00834737066179514
Validation loss = 0.009839455597102642
Validation loss = 0.00881873443722725
Validation loss = 0.008159123361110687
Validation loss = 0.009722680784761906
Validation loss = 0.00877166073769331
Validation loss = 0.009775283746421337
Validation loss = 0.00992991030216217
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009485575370490551
Validation loss = 0.009039411321282387
Validation loss = 0.009390346705913544
Validation loss = 0.009512186050415039
Validation loss = 0.010057465173304081
Validation loss = 0.009439820423722267
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008493263274431229
Validation loss = 0.007711843121796846
Validation loss = 0.007899346761405468
Validation loss = 0.008531336672604084
Validation loss = 0.007870710454881191
Validation loss = 0.008749227039515972
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008624948561191559
Validation loss = 0.00866007711738348
Validation loss = 0.008659692481160164
Validation loss = 0.009445687755942345
Validation loss = 0.009445358999073505
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000889 |
| Iteration     | 39        |
| MaximumReturn | -0.000551 |
| MinimumReturn | -0.00168  |
| TotalSamples  | 68306     |
-----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009494456462562084
Validation loss = 0.00838721264153719
Validation loss = 0.008541619405150414
Validation loss = 0.008468592539429665
Validation loss = 0.009373496286571026
Validation loss = 0.012872959487140179
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01066418644040823
Validation loss = 0.009893960319459438
Validation loss = 0.008880901150405407
Validation loss = 0.009191019460558891
Validation loss = 0.0082290293648839
Validation loss = 0.007990878075361252
Validation loss = 0.008638539351522923
Validation loss = 0.009120343253016472
Validation loss = 0.008901353925466537
Validation loss = 0.01129191741347313
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009249543771147728
Validation loss = 0.010848659090697765
Validation loss = 0.009685068391263485
Validation loss = 0.009368620812892914
Validation loss = 0.009959095157682896
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012624279595911503
Validation loss = 0.008587448857724667
Validation loss = 0.00965859368443489
Validation loss = 0.008539179340004921
Validation loss = 0.008176587522029877
Validation loss = 0.008303475566208363
Validation loss = 0.008286206983029842
Validation loss = 0.008733809925615788
Validation loss = 0.008876577951014042
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010453675873577595
Validation loss = 0.01632879115641117
Validation loss = 0.009426480159163475
Validation loss = 0.009275461547076702
Validation loss = 0.009371636435389519
Validation loss = 0.009216795675456524
Validation loss = 0.009652691893279552
Validation loss = 0.00941840186715126
Validation loss = 0.009264558553695679
Validation loss = 0.008299585431814194
Validation loss = 0.008480731397867203
Validation loss = 0.008874974213540554
Validation loss = 0.008827009238302708
Validation loss = 0.009377509355545044
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00101  |
| Iteration     | 40        |
| MaximumReturn | -0.000639 |
| MinimumReturn | -0.00212  |
| TotalSamples  | 69972     |
-----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010389349423348904
Validation loss = 0.008449564687907696
Validation loss = 0.008453032933175564
Validation loss = 0.008225303143262863
Validation loss = 0.008722676895558834
Validation loss = 0.008014853112399578
Validation loss = 0.008148974739015102
Validation loss = 0.010298640467226505
Validation loss = 0.008769091218709946
Validation loss = 0.009292629547417164
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008380892686545849
Validation loss = 0.009763358160853386
Validation loss = 0.008711522445082664
Validation loss = 0.009872912429273129
Validation loss = 0.008627540431916714
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009760427288711071
Validation loss = 0.010250324383378029
Validation loss = 0.009707567282021046
Validation loss = 0.012047776021063328
Validation loss = 0.010112776421010494
Validation loss = 0.009331271052360535
Validation loss = 0.009953371249139309
Validation loss = 0.01032809354364872
Validation loss = 0.009813074953854084
Validation loss = 0.009050990454852581
Validation loss = 0.009245666675269604
Validation loss = 0.009369652718305588
Validation loss = 0.009448125958442688
Validation loss = 0.01017303578555584
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008778363466262817
Validation loss = 0.008247616700828075
Validation loss = 0.009334505535662174
Validation loss = 0.00828732643276453
Validation loss = 0.008765550330281258
Validation loss = 0.008271286264061928
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008853442035615444
Validation loss = 0.008786852471530437
Validation loss = 0.010355189442634583
Validation loss = 0.008408110588788986
Validation loss = 0.010169997811317444
Validation loss = 0.009820702485740185
Validation loss = 0.009597718715667725
Validation loss = 0.008352348580956459
Validation loss = 0.009776681661605835
Validation loss = 0.00899962056428194
Validation loss = 0.008779065683484077
Validation loss = 0.009059901349246502
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00102  |
| Iteration     | 41        |
| MaximumReturn | -0.000612 |
| MinimumReturn | -0.00165  |
| TotalSamples  | 71638     |
-----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009525228291749954
Validation loss = 0.008533102460205555
Validation loss = 0.00934534426778555
Validation loss = 0.009798003360629082
Validation loss = 0.00880975928157568
Validation loss = 0.009595229290425777
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008511272259056568
Validation loss = 0.008972232230007648
Validation loss = 0.009102427400648594
Validation loss = 0.009085693396627903
Validation loss = 0.008669575676321983
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010228353552520275
Validation loss = 0.009512609802186489
Validation loss = 0.014598415233194828
Validation loss = 0.009811747819185257
Validation loss = 0.009061337448656559
Validation loss = 0.009436448104679585
Validation loss = 0.01096174493432045
Validation loss = 0.010592140257358551
Validation loss = 0.009751804172992706
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009138687513768673
Validation loss = 0.009120378643274307
Validation loss = 0.007719697430729866
Validation loss = 0.007813340052962303
Validation loss = 0.007838478311896324
Validation loss = 0.007766108959913254
Validation loss = 0.007990335114300251
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009344837628304958
Validation loss = 0.009660711511969566
Validation loss = 0.008806681260466576
Validation loss = 0.011094772256910801
Validation loss = 0.008878285996615887
Validation loss = 0.009347254410386086
Validation loss = 0.00962129607796669
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00101  |
| Iteration     | 42        |
| MaximumReturn | -0.000674 |
| MinimumReturn | -0.00159  |
| TotalSamples  | 73304     |
-----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009062889963388443
Validation loss = 0.008858543820679188
Validation loss = 0.008119705133140087
Validation loss = 0.008324199356138706
Validation loss = 0.0083333570510149
Validation loss = 0.00879658292979002
Validation loss = 0.009347780607640743
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012338067404925823
Validation loss = 0.00997786596417427
Validation loss = 0.00980597548186779
Validation loss = 0.009460215456783772
Validation loss = 0.009153135120868683
Validation loss = 0.009002204984426498
Validation loss = 0.008910305798053741
Validation loss = 0.009276476688683033
Validation loss = 0.008799076080322266
Validation loss = 0.009190085344016552
Validation loss = 0.01117902435362339
Validation loss = 0.009257986210286617
Validation loss = 0.009126503020524979
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009813178330659866
Validation loss = 0.009824933484196663
Validation loss = 0.010004212148487568
Validation loss = 0.009848836809396744
Validation loss = 0.011028864420950413
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009217781946063042
Validation loss = 0.008769722655415535
Validation loss = 0.008190084248781204
Validation loss = 0.011590017937123775
Validation loss = 0.008334232494235039
Validation loss = 0.008264722302556038
Validation loss = 0.008832641877233982
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0098368925973773
Validation loss = 0.009636137634515762
Validation loss = 0.009515199810266495
Validation loss = 0.009867708198726177
Validation loss = 0.011521201580762863
Validation loss = 0.009190947748720646
Validation loss = 0.010924060828983784
Validation loss = 0.009387923404574394
Validation loss = 0.009191753342747688
Validation loss = 0.00900574866682291
Validation loss = 0.009929561987519264
Validation loss = 0.009459196589887142
Validation loss = 0.009114880114793777
Validation loss = 0.010051933117210865
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.001    |
| Iteration     | 43        |
| MaximumReturn | -0.000686 |
| MinimumReturn | -0.00157  |
| TotalSamples  | 74970     |
-----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009167817421257496
Validation loss = 0.009084494784474373
Validation loss = 0.008481541648507118
Validation loss = 0.01001792959868908
Validation loss = 0.0093105873093009
Validation loss = 0.009531465359032154
Validation loss = 0.009452548809349537
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009465828537940979
Validation loss = 0.0088845519348979
Validation loss = 0.009320298209786415
Validation loss = 0.0088801933452487
Validation loss = 0.008179403841495514
Validation loss = 0.009653727523982525
Validation loss = 0.009591082111001015
Validation loss = 0.00928476732224226
Validation loss = 0.009479353204369545
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01013381127268076
Validation loss = 0.00991894956678152
Validation loss = 0.00987444818019867
Validation loss = 0.011556929908692837
Validation loss = 0.010199499316513538
Validation loss = 0.009498609229922295
Validation loss = 0.009367754682898521
Validation loss = 0.010229724459350109
Validation loss = 0.01104601752012968
Validation loss = 0.009601290337741375
Validation loss = 0.010015777312219143
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009879028424620628
Validation loss = 0.009404325857758522
Validation loss = 0.00859824102371931
Validation loss = 0.008937717415392399
Validation loss = 0.008657196536660194
Validation loss = 0.008144487626850605
Validation loss = 0.00779176177456975
Validation loss = 0.009719050489366055
Validation loss = 0.00831878837198019
Validation loss = 0.007885405793786049
Validation loss = 0.008601452223956585
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01005389355123043
Validation loss = 0.01946849562227726
Validation loss = 0.009867141023278236
Validation loss = 0.009431722573935986
Validation loss = 0.009255936369299889
Validation loss = 0.010820487514138222
Validation loss = 0.00980615895241499
Validation loss = 0.009491750039160252
Validation loss = 0.008331498131155968
Validation loss = 0.009323319420218468
Validation loss = 0.009027688764035702
Validation loss = 0.008980637416243553
Validation loss = 0.011289091780781746
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00116  |
| Iteration     | 44        |
| MaximumReturn | -0.000657 |
| MinimumReturn | -0.00226  |
| TotalSamples  | 76636     |
-----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011294886469841003
Validation loss = 0.010079297237098217
Validation loss = 0.009732045233249664
Validation loss = 0.009384281933307648
Validation loss = 0.008828915655612946
Validation loss = 0.009493730030953884
Validation loss = 0.011057061143219471
Validation loss = 0.009420921094715595
Validation loss = 0.008751709014177322
Validation loss = 0.008643974550068378
Validation loss = 0.009474330581724644
Validation loss = 0.008324803784489632
Validation loss = 0.00864310935139656
Validation loss = 0.009283669292926788
Validation loss = 0.008257314562797546
Validation loss = 0.00844354834407568
Validation loss = 0.01123043056577444
Validation loss = 0.008281577378511429
Validation loss = 0.009283069521188736
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010035877116024494
Validation loss = 0.00864375289529562
Validation loss = 0.01029458548873663
Validation loss = 0.008630563504993916
Validation loss = 0.008687401190400124
Validation loss = 0.008793022483587265
Validation loss = 0.011469623073935509
Validation loss = 0.008839682675898075
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010083325207233429
Validation loss = 0.015116636641323566
Validation loss = 0.011427616700530052
Validation loss = 0.01004698220640421
Validation loss = 0.0095748296007514
Validation loss = 0.010073680430650711
Validation loss = 0.012356461957097054
Validation loss = 0.00966651365160942
Validation loss = 0.010322549380362034
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008336658589541912
Validation loss = 0.008281591348350048
Validation loss = 0.008093730546534061
Validation loss = 0.010159206576645374
Validation loss = 0.008656942285597324
Validation loss = 0.008767780847847462
Validation loss = 0.008286458440124989
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010459174402058125
Validation loss = 0.009258337318897247
Validation loss = 0.009421922266483307
Validation loss = 0.009362556971609592
Validation loss = 0.009959377348423004
Validation loss = 0.010235876776278019
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00102  |
| Iteration     | 45        |
| MaximumReturn | -0.000632 |
| MinimumReturn | -0.00228  |
| TotalSamples  | 78302     |
-----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00926331989467144
Validation loss = 0.0109485425055027
Validation loss = 0.009366086684167385
Validation loss = 0.008495711721479893
Validation loss = 0.008762026205658913
Validation loss = 0.009158710949122906
Validation loss = 0.008698608726263046
Validation loss = 0.009069200605154037
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009612346068024635
Validation loss = 0.009739344008266926
Validation loss = 0.01099724043160677
Validation loss = 0.00948488898575306
Validation loss = 0.009462307207286358
Validation loss = 0.008723743259906769
Validation loss = 0.009713809937238693
Validation loss = 0.009146458469331264
Validation loss = 0.008395714685320854
Validation loss = 0.008661148138344288
Validation loss = 0.008454984985291958
Validation loss = 0.012049625627696514
Validation loss = 0.01015250850468874
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00974099151790142
Validation loss = 0.010913174599409103
Validation loss = 0.009856387041509151
Validation loss = 0.011408180929720402
Validation loss = 0.01017778366804123
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008570377714931965
Validation loss = 0.009325114078819752
Validation loss = 0.00898009818047285
Validation loss = 0.008256292901933193
Validation loss = 0.008456598035991192
Validation loss = 0.00797058455646038
Validation loss = 0.010931158438324928
Validation loss = 0.008235346525907516
Validation loss = 0.010057750158011913
Validation loss = 0.008309455588459969
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00866402592509985
Validation loss = 0.009649514220654964
Validation loss = 0.00990489311516285
Validation loss = 0.009330297820270061
Validation loss = 0.00931432656943798
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00123  |
| Iteration     | 46        |
| MaximumReturn | -0.000843 |
| MinimumReturn | -0.00229  |
| TotalSamples  | 79968     |
-----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010864345356822014
Validation loss = 0.01067652553319931
Validation loss = 0.009184466674923897
Validation loss = 0.008995605632662773
Validation loss = 0.009054402820765972
Validation loss = 0.00903732143342495
Validation loss = 0.008550980128347874
Validation loss = 0.009986321441829205
Validation loss = 0.008455500937998295
Validation loss = 0.008321619592607021
Validation loss = 0.008250960148870945
Validation loss = 0.008656071498990059
Validation loss = 0.008943048305809498
Validation loss = 0.010104159824550152
Validation loss = 0.009066127240657806
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009125995449721813
Validation loss = 0.00983387976884842
Validation loss = 0.008722439408302307
Validation loss = 0.010291959159076214
Validation loss = 0.009754826314747334
Validation loss = 0.009173410013318062
Validation loss = 0.00897101778537035
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010472536087036133
Validation loss = 0.01068940944969654
Validation loss = 0.010442811995744705
Validation loss = 0.010637698695063591
Validation loss = 0.0106899943202734
Validation loss = 0.011119967326521873
Validation loss = 0.01082570105791092
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008322013542056084
Validation loss = 0.009179060347378254
Validation loss = 0.00844042282551527
Validation loss = 0.009150393307209015
Validation loss = 0.008984418585896492
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008641444146633148
Validation loss = 0.009095536544919014
Validation loss = 0.009878640994429588
Validation loss = 0.009375259280204773
Validation loss = 0.009196268394589424
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000938 |
| Iteration     | 47        |
| MaximumReturn | -0.000595 |
| MinimumReturn | -0.00173  |
| TotalSamples  | 81634     |
-----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009105962701141834
Validation loss = 0.009736555628478527
Validation loss = 0.008570993319153786
Validation loss = 0.008601365610957146
Validation loss = 0.009850169532001019
Validation loss = 0.011082498356699944
Validation loss = 0.008457833901047707
Validation loss = 0.01016770489513874
Validation loss = 0.008864725939929485
Validation loss = 0.008648892864584923
Validation loss = 0.008771372959017754
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008844273164868355
Validation loss = 0.010052002035081387
Validation loss = 0.011168639175593853
Validation loss = 0.009067326784133911
Validation loss = 0.009367717429995537
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010386516340076923
Validation loss = 0.009865853004157543
Validation loss = 0.00986299104988575
Validation loss = 0.00996231846511364
Validation loss = 0.009707790799438953
Validation loss = 0.010453158058226109
Validation loss = 0.009891633875668049
Validation loss = 0.009889500215649605
Validation loss = 0.010916809551417828
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009237190708518028
Validation loss = 0.008401895873248577
Validation loss = 0.00888829119503498
Validation loss = 0.008948351256549358
Validation loss = 0.009859209880232811
Validation loss = 0.00846630148589611
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009194246493279934
Validation loss = 0.010215787217020988
Validation loss = 0.0101619316264987
Validation loss = 0.010907607153058052
Validation loss = 0.010359779000282288
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000893 |
| Iteration     | 48        |
| MaximumReturn | -0.000696 |
| MinimumReturn | -0.00116  |
| TotalSamples  | 83300     |
-----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008763682097196579
Validation loss = 0.008902916684746742
Validation loss = 0.009205828420817852
Validation loss = 0.010631633922457695
Validation loss = 0.008496935479342937
Validation loss = 0.008634784258902073
Validation loss = 0.009126327000558376
Validation loss = 0.00889652781188488
Validation loss = 0.009123972617089748
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0090286023914814
Validation loss = 0.008969615213572979
Validation loss = 0.008486974984407425
Validation loss = 0.009558888152241707
Validation loss = 0.008948022499680519
Validation loss = 0.00878329947590828
Validation loss = 0.008746959269046783
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012202229350805283
Validation loss = 0.009355472400784492
Validation loss = 0.009630143642425537
Validation loss = 0.009749760851264
Validation loss = 0.010392342694103718
Validation loss = 0.009991534985601902
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008604886010289192
Validation loss = 0.008307412266731262
Validation loss = 0.009070051833987236
Validation loss = 0.010153556242585182
Validation loss = 0.008958743885159492
Validation loss = 0.008741559460759163
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010316942818462849
Validation loss = 0.009766501374542713
Validation loss = 0.009329847060143948
Validation loss = 0.013274316675961018
Validation loss = 0.009806932881474495
Validation loss = 0.00956716202199459
Validation loss = 0.010003111325204372
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000962 |
| Iteration     | 49        |
| MaximumReturn | -0.000645 |
| MinimumReturn | -0.00147  |
| TotalSamples  | 84966     |
-----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008999700658023357
Validation loss = 0.009598495438694954
Validation loss = 0.008622311986982822
Validation loss = 0.008673148229718208
Validation loss = 0.008975006639957428
Validation loss = 0.009625290520489216
Validation loss = 0.009187879040837288
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008471631444990635
Validation loss = 0.008752645924687386
Validation loss = 0.014762086793780327
Validation loss = 0.009054424241185188
Validation loss = 0.0091055016964674
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009756306186318398
Validation loss = 0.010336015373468399
Validation loss = 0.010128135792911053
Validation loss = 0.010904593393206596
Validation loss = 0.010001396760344505
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008777030743658543
Validation loss = 0.008480089716613293
Validation loss = 0.009092486463487148
Validation loss = 0.010925340466201305
Validation loss = 0.00995195284485817
Validation loss = 0.009698753245174885
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009313033893704414
Validation loss = 0.00956642534583807
Validation loss = 0.009563173167407513
Validation loss = 0.013600443489849567
Validation loss = 0.00891608465462923
Validation loss = 0.009032338857650757
Validation loss = 0.00897256750613451
Validation loss = 0.014446333982050419
Validation loss = 0.008741467259824276
Validation loss = 0.010585793294012547
Validation loss = 0.009642237797379494
Validation loss = 0.00929967314004898
Validation loss = 0.009199339896440506
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00107 |
| Iteration     | 50       |
| MaximumReturn | -0.00075 |
| MinimumReturn | -0.00173 |
| TotalSamples  | 86632    |
----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009008258581161499
Validation loss = 0.009041392244398594
Validation loss = 0.009427767246961594
Validation loss = 0.008379318751394749
Validation loss = 0.010045569390058517
Validation loss = 0.009171118028461933
Validation loss = 0.010251980274915695
Validation loss = 0.009901847690343857
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009038545191287994
Validation loss = 0.010102372616529465
Validation loss = 0.00952419638633728
Validation loss = 0.008498070761561394
Validation loss = 0.009482239373028278
Validation loss = 0.00882950983941555
Validation loss = 0.010912549681961536
Validation loss = 0.01148355845361948
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017729908227920532
Validation loss = 0.0107205118983984
Validation loss = 0.009868617169559002
Validation loss = 0.009782088920474052
Validation loss = 0.010937338694930077
Validation loss = 0.009526407346129417
Validation loss = 0.009818962775170803
Validation loss = 0.010860382579267025
Validation loss = 0.009866810403764248
Validation loss = 0.011987557634711266
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00971290934830904
Validation loss = 0.009055044502019882
Validation loss = 0.009618728421628475
Validation loss = 0.009639392606914043
Validation loss = 0.008646627888083458
Validation loss = 0.012873631902039051
Validation loss = 0.008726007305085659
Validation loss = 0.009003221057355404
Validation loss = 0.008868714794516563
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009390794672071934
Validation loss = 0.010968372225761414
Validation loss = 0.010023617185652256
Validation loss = 0.010586140677332878
Validation loss = 0.010188995860517025
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00265 |
| Iteration     | 51       |
| MaximumReturn | -0.00198 |
| MinimumReturn | -0.00445 |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010866652242839336
Validation loss = 0.009612301364541054
Validation loss = 0.009426918812096119
Validation loss = 0.009623794816434383
Validation loss = 0.009398073889315128
Validation loss = 0.009405567310750484
Validation loss = 0.009370345622301102
Validation loss = 0.008309772238135338
Validation loss = 0.009863474406301975
Validation loss = 0.009353510104119778
Validation loss = 0.008979622274637222
Validation loss = 0.009502451866865158
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010593991726636887
Validation loss = 0.00911504216492176
Validation loss = 0.008495413698256016
Validation loss = 0.008686280809342861
Validation loss = 0.008961121551692486
Validation loss = 0.011428529396653175
Validation loss = 0.00884975679218769
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01023850217461586
Validation loss = 0.013899159617722034
Validation loss = 0.010033841244876385
Validation loss = 0.010024171322584152
Validation loss = 0.010781016200780869
Validation loss = 0.01237327791750431
Validation loss = 0.010438368655741215
Validation loss = 0.009566714987158775
Validation loss = 0.008982791565358639
Validation loss = 0.0095652025192976
Validation loss = 0.009648654609918594
Validation loss = 0.01154033187776804
Validation loss = 0.009480305016040802
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008418123237788677
Validation loss = 0.009178939275443554
Validation loss = 0.008958404883742332
Validation loss = 0.01055134180933237
Validation loss = 0.0100287776440382
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009905721060931683
Validation loss = 0.010477988980710506
Validation loss = 0.00961274467408657
Validation loss = 0.010102479718625546
Validation loss = 0.009183123707771301
Validation loss = 0.010722116567194462
Validation loss = 0.009533386677503586
Validation loss = 0.00964457169175148
Validation loss = 0.010870776139199734
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0012   |
| Iteration     | 52        |
| MaximumReturn | -0.000696 |
| MinimumReturn | -0.00266  |
| TotalSamples  | 89964     |
-----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008919063955545425
Validation loss = 0.00922981183975935
Validation loss = 0.008984347805380821
Validation loss = 0.009568878449499607
Validation loss = 0.009316655807197094
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009058073163032532
Validation loss = 0.00885387510061264
Validation loss = 0.009046090766787529
Validation loss = 0.012037198059260845
Validation loss = 0.009859868325293064
Validation loss = 0.00935310497879982
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01012863777577877
Validation loss = 0.0102343475446105
Validation loss = 0.009915832430124283
Validation loss = 0.009969902224838734
Validation loss = 0.009760135784745216
Validation loss = 0.010998472571372986
Validation loss = 0.010276053100824356
Validation loss = 0.01028357446193695
Validation loss = 0.010461930185556412
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00923886802047491
Validation loss = 0.008919443003833294
Validation loss = 0.009521265514194965
Validation loss = 0.00910383090376854
Validation loss = 0.008792788721621037
Validation loss = 0.00871141254901886
Validation loss = 0.009071559645235538
Validation loss = 0.00923984032124281
Validation loss = 0.008962276391685009
Validation loss = 0.00978624913841486
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010077523067593575
Validation loss = 0.00958913005888462
Validation loss = 0.009286980144679546
Validation loss = 0.00985277071595192
Validation loss = 0.010914572514593601
Validation loss = 0.010393478907644749
Validation loss = 0.009981406852602959
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00113  |
| Iteration     | 53        |
| MaximumReturn | -0.000713 |
| MinimumReturn | -0.00203  |
| TotalSamples  | 91630     |
-----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00904341135174036
Validation loss = 0.010286244563758373
Validation loss = 0.009597333148121834
Validation loss = 0.009608319029211998
Validation loss = 0.009470895864069462
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008901298977434635
Validation loss = 0.00923838373273611
Validation loss = 0.009174831211566925
Validation loss = 0.0100305937230587
Validation loss = 0.008940931409597397
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009971569292247295
Validation loss = 0.009847964160144329
Validation loss = 0.009839886799454689
Validation loss = 0.010842391289770603
Validation loss = 0.011384197510778904
Validation loss = 0.01071060262620449
Validation loss = 0.010022047907114029
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009618585929274559
Validation loss = 0.013832162134349346
Validation loss = 0.009519021026790142
Validation loss = 0.008808753453195095
Validation loss = 0.009882919490337372
Validation loss = 0.010571131482720375
Validation loss = 0.009692429564893246
Validation loss = 0.009752159006893635
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009938315488398075
Validation loss = 0.010025511495769024
Validation loss = 0.009933548979461193
Validation loss = 0.010697947815060616
Validation loss = 0.010201235301792622
Validation loss = 0.010393787175416946
Validation loss = 0.009484331123530865
Validation loss = 0.00976339541375637
Validation loss = 0.010147344321012497
Validation loss = 0.009748635813593864
Validation loss = 0.00975271500647068
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00217 |
| Iteration     | 54       |
| MaximumReturn | -0.00123 |
| MinimumReturn | -0.00599 |
| TotalSamples  | 93296    |
----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00940883718430996
Validation loss = 0.01342820841819048
Validation loss = 0.010116180405020714
Validation loss = 0.010411233641207218
Validation loss = 0.010122370906174183
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008600808680057526
Validation loss = 0.010451358743011951
Validation loss = 0.008864261209964752
Validation loss = 0.010143164545297623
Validation loss = 0.009369186125695705
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009799656458199024
Validation loss = 0.01009340863674879
Validation loss = 0.009846683591604233
Validation loss = 0.013252109289169312
Validation loss = 0.010111181996762753
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00864760484546423
Validation loss = 0.01073786523193121
Validation loss = 0.009522372856736183
Validation loss = 0.008882489986717701
Validation loss = 0.009225352667272091
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009948897175490856
Validation loss = 0.013317595236003399
Validation loss = 0.009960168972611427
Validation loss = 0.009910480119287968
Validation loss = 0.010107243433594704
Validation loss = 0.009869871661067009
Validation loss = 0.010339603759348392
Validation loss = 0.010311642661690712
Validation loss = 0.01044128742069006
Validation loss = 0.009985730051994324
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000998 |
| Iteration     | 55        |
| MaximumReturn | -0.000621 |
| MinimumReturn | -0.00183  |
| TotalSamples  | 94962     |
-----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009078663773834705
Validation loss = 0.010477803647518158
Validation loss = 0.009577102027833462
Validation loss = 0.009462932124733925
Validation loss = 0.010424354113638401
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009453955106437206
Validation loss = 0.01063983328640461
Validation loss = 0.009052072651684284
Validation loss = 0.009180358611047268
Validation loss = 0.009739301167428493
Validation loss = 0.009513214230537415
Validation loss = 0.010869618505239487
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010798057541251183
Validation loss = 0.010839767754077911
Validation loss = 0.010263894684612751
Validation loss = 0.01105016190558672
Validation loss = 0.010701170191168785
Validation loss = 0.01021425798535347
Validation loss = 0.010670335032045841
Validation loss = 0.01150804664939642
Validation loss = 0.010522251948714256
Validation loss = 0.009784460999071598
Validation loss = 0.00985963549464941
Validation loss = 0.00955377146601677
Validation loss = 0.00997130572795868
Validation loss = 0.009729024954140186
Validation loss = 0.010658263228833675
Validation loss = 0.009867597371339798
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009353232569992542
Validation loss = 0.009471733123064041
Validation loss = 0.009132126346230507
Validation loss = 0.010406806133687496
Validation loss = 0.009357585571706295
Validation loss = 0.009977724403142929
Validation loss = 0.009522927924990654
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00982892420142889
Validation loss = 0.009478181600570679
Validation loss = 0.00995111558586359
Validation loss = 0.012995698489248753
Validation loss = 0.01027812622487545
Validation loss = 0.01048843190073967
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00151 |
| Iteration     | 56       |
| MaximumReturn | -0.0012  |
| MinimumReturn | -0.00233 |
| TotalSamples  | 96628    |
----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010829384438693523
Validation loss = 0.009881348349153996
Validation loss = 0.0097179114818573
Validation loss = 0.00958340149372816
Validation loss = 0.009413238614797592
Validation loss = 0.01765693537890911
Validation loss = 0.009474239312112331
Validation loss = 0.013034321367740631
Validation loss = 0.00945738423615694
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008960873819887638
Validation loss = 0.008841750212013721
Validation loss = 0.009347316808998585
Validation loss = 0.009179603308439255
Validation loss = 0.010858729481697083
Validation loss = 0.01000383123755455
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015667036175727844
Validation loss = 0.01147370133548975
Validation loss = 0.010048090480268002
Validation loss = 0.009909321554005146
Validation loss = 0.010388658381998539
Validation loss = 0.009562712162733078
Validation loss = 0.009658746421337128
Validation loss = 0.009476684965193272
Validation loss = 0.011075158603489399
Validation loss = 0.009309764951467514
Validation loss = 0.010013899765908718
Validation loss = 0.014254624955356121
Validation loss = 0.00998793076723814
Validation loss = 0.010084208101034164
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009364839643239975
Validation loss = 0.00888131558895111
Validation loss = 0.009281915612518787
Validation loss = 0.010072262026369572
Validation loss = 0.009518909268081188
Validation loss = 0.009125438518822193
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010240929201245308
Validation loss = 0.010168113745748997
Validation loss = 0.009406785480678082
Validation loss = 0.009873609058558941
Validation loss = 0.010818589478731155
Validation loss = 0.01153686922043562
Validation loss = 0.009977507404983044
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00106  |
| Iteration     | 57        |
| MaximumReturn | -0.000643 |
| MinimumReturn | -0.00192  |
| TotalSamples  | 98294     |
-----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010404979810118675
Validation loss = 0.009280674159526825
Validation loss = 0.011839576996862888
Validation loss = 0.011482378467917442
Validation loss = 0.011234288103878498
Validation loss = 0.010472612455487251
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009488058276474476
Validation loss = 0.009916743263602257
Validation loss = 0.010992436669766903
Validation loss = 0.010329666547477245
Validation loss = 0.00946050975471735
Validation loss = 0.009204816073179245
Validation loss = 0.009465132839977741
Validation loss = 0.01098752673715353
Validation loss = 0.009810809046030045
Validation loss = 0.009048791602253914
Validation loss = 0.009341258555650711
Validation loss = 0.009430504404008389
Validation loss = 0.00958772748708725
Validation loss = 0.00959765911102295
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010288779623806477
Validation loss = 0.010044261813163757
Validation loss = 0.01078889425843954
Validation loss = 0.010221474803984165
Validation loss = 0.009798524901270866
Validation loss = 0.010112952440977097
Validation loss = 0.01064257137477398
Validation loss = 0.00985228456556797
Validation loss = 0.009627084247767925
Validation loss = 0.010320687666535378
Validation loss = 0.009563065133988857
Validation loss = 0.012640224769711494
Validation loss = 0.010472134687006474
Validation loss = 0.009510889649391174
Validation loss = 0.009833146817982197
Validation loss = 0.009801911190152168
Validation loss = 0.00958362128585577
Validation loss = 0.010220738127827644
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009567545726895332
Validation loss = 0.009100131690502167
Validation loss = 0.009649287909269333
Validation loss = 0.009804794564843178
Validation loss = 0.008899673819541931
Validation loss = 0.009636916220188141
Validation loss = 0.009320591576397419
Validation loss = 0.010528591461479664
Validation loss = 0.009346186183393002
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009761913679540157
Validation loss = 0.009790608659386635
Validation loss = 0.010120396502315998
Validation loss = 0.010412461124360561
Validation loss = 0.010733098722994328
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00113  |
| Iteration     | 58        |
| MaximumReturn | -0.000611 |
| MinimumReturn | -0.00285  |
| TotalSamples  | 99960     |
-----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01031256839632988
Validation loss = 0.010806713253259659
Validation loss = 0.011876178905367851
Validation loss = 0.010777056217193604
Validation loss = 0.009605840779840946
Validation loss = 0.009877339005470276
Validation loss = 0.009939468465745449
Validation loss = 0.009674180299043655
Validation loss = 0.01293526217341423
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008730944246053696
Validation loss = 0.011731812730431557
Validation loss = 0.009629962034523487
Validation loss = 0.009607609361410141
Validation loss = 0.00950688123703003
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01002806518226862
Validation loss = 0.017419181764125824
Validation loss = 0.010190257802605629
Validation loss = 0.009830713272094727
Validation loss = 0.010262562893331051
Validation loss = 0.00962219201028347
Validation loss = 0.010168330743908882
Validation loss = 0.0099873598664999
Validation loss = 0.009610517881810665
Validation loss = 0.009565761312842369
Validation loss = 0.00998864509165287
Validation loss = 0.010567980818450451
Validation loss = 0.010268095880746841
Validation loss = 0.010870962403714657
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010143667459487915
Validation loss = 0.01077717449516058
Validation loss = 0.00938753504306078
Validation loss = 0.00915924645960331
Validation loss = 0.009183667600154877
Validation loss = 0.008878510445356369
Validation loss = 0.008875180035829544
Validation loss = 0.009681999683380127
Validation loss = 0.009209906682372093
Validation loss = 0.009355205111205578
Validation loss = 0.00991725642234087
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010399553924798965
Validation loss = 0.009485316462814808
Validation loss = 0.010182440280914307
Validation loss = 0.009868665598332882
Validation loss = 0.010969383642077446
Validation loss = 0.01119590550661087
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000985 |
| Iteration     | 59        |
| MaximumReturn | -0.000703 |
| MinimumReturn | -0.00149  |
| TotalSamples  | 101626    |
-----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011908059008419514
Validation loss = 0.009768359363079071
Validation loss = 0.010150625370442867
Validation loss = 0.010526414960622787
Validation loss = 0.009378017857670784
Validation loss = 0.013156063854694366
Validation loss = 0.012169561348855495
Validation loss = 0.010535616427659988
Validation loss = 0.010634249076247215
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009742227382957935
Validation loss = 0.009394757449626923
Validation loss = 0.01123169157654047
Validation loss = 0.01078125275671482
Validation loss = 0.009672478772699833
Validation loss = 0.009849500842392445
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010418258607387543
Validation loss = 0.010851693339645863
Validation loss = 0.010969110764563084
Validation loss = 0.010525734163820744
Validation loss = 0.01121076475828886
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00946835521608591
Validation loss = 0.009437388740479946
Validation loss = 0.009470387361943722
Validation loss = 0.00925257708877325
Validation loss = 0.009837529622018337
Validation loss = 0.009681417606770992
Validation loss = 0.010088491253554821
Validation loss = 0.00945738609880209
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00964578427374363
Validation loss = 0.009897313080728054
Validation loss = 0.010909914039075375
Validation loss = 0.010596386156976223
Validation loss = 0.011030608788132668
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00102  |
| Iteration     | 60        |
| MaximumReturn | -0.000652 |
| MinimumReturn | -0.00216  |
| TotalSamples  | 103292    |
-----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010303054004907608
Validation loss = 0.010424908250570297
Validation loss = 0.010872570797801018
Validation loss = 0.00956276897341013
Validation loss = 0.011234746314585209
Validation loss = 0.009768839925527573
Validation loss = 0.009952216409146786
Validation loss = 0.010104102082550526
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0098115811124444
Validation loss = 0.010717061348259449
Validation loss = 0.009731908328831196
Validation loss = 0.010097933933138847
Validation loss = 0.010619455948472023
Validation loss = 0.009951340034604073
Validation loss = 0.009623081423342228
Validation loss = 0.010621340945363045
Validation loss = 0.009060332551598549
Validation loss = 0.009189625270664692
Validation loss = 0.009758644737303257
Validation loss = 0.009036615490913391
Validation loss = 0.009123050607740879
Validation loss = 0.009305180050432682
Validation loss = 0.013817918486893177
Validation loss = 0.010216943919658661
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009792016819119453
Validation loss = 0.010573436506092548
Validation loss = 0.009960537776350975
Validation loss = 0.010162971913814545
Validation loss = 0.009949509054422379
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010222080163657665
Validation loss = 0.009634862653911114
Validation loss = 0.00928532425314188
Validation loss = 0.00907912291586399
Validation loss = 0.009328288026154041
Validation loss = 0.008922003209590912
Validation loss = 0.008994359523057938
Validation loss = 0.010333693586289883
Validation loss = 0.009769911877810955
Validation loss = 0.009030159562826157
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012275809422135353
Validation loss = 0.010392474010586739
Validation loss = 0.010410896502435207
Validation loss = 0.010930404998362064
Validation loss = 0.013122545555233955
Validation loss = 0.010571036487817764
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00105  |
| Iteration     | 61        |
| MaximumReturn | -0.000637 |
| MinimumReturn | -0.00161  |
| TotalSamples  | 104958    |
-----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010779757983982563
Validation loss = 0.00961904413998127
Validation loss = 0.010105888359248638
Validation loss = 0.011211633682250977
Validation loss = 0.01033345889300108
Validation loss = 0.01000971719622612
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00988893210887909
Validation loss = 0.009591152891516685
Validation loss = 0.010798709467053413
Validation loss = 0.009893185459077358
Validation loss = 0.009843626990914345
Validation loss = 0.009636858478188515
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012624649330973625
Validation loss = 0.010853935033082962
Validation loss = 0.010055397637188435
Validation loss = 0.010659008286893368
Validation loss = 0.010511593893170357
Validation loss = 0.010143167339265347
Validation loss = 0.011243571527302265
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009424923919141293
Validation loss = 0.010110212489962578
Validation loss = 0.00987909734249115
Validation loss = 0.009907877072691917
Validation loss = 0.009264686144888401
Validation loss = 0.009632519446313381
Validation loss = 0.008863847702741623
Validation loss = 0.009145484305918217
Validation loss = 0.009389796294271946
Validation loss = 0.009122504852712154
Validation loss = 0.00981630478054285
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010060548782348633
Validation loss = 0.011130760423839092
Validation loss = 0.010822783224284649
Validation loss = 0.009850536473095417
Validation loss = 0.01049102284014225
Validation loss = 0.010175708681344986
Validation loss = 0.01021619513630867
Validation loss = 0.009805574081838131
Validation loss = 0.010302184149622917
Validation loss = 0.010139421559870243
Validation loss = 0.010362500324845314
Validation loss = 0.010187183506786823
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00106  |
| Iteration     | 62        |
| MaximumReturn | -0.000733 |
| MinimumReturn | -0.00215  |
| TotalSamples  | 106624    |
-----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01025772001594305
Validation loss = 0.012288673780858517
Validation loss = 0.010490037500858307
Validation loss = 0.010184706188738346
Validation loss = 0.010229391045868397
Validation loss = 0.010370482690632343
Validation loss = 0.010218141600489616
Validation loss = 0.010103859938681126
Validation loss = 0.010631284676492214
Validation loss = 0.009524728171527386
Validation loss = 0.010974906384944916
Validation loss = 0.009269043803215027
Validation loss = 0.009767717681825161
Validation loss = 0.010342620313167572
Validation loss = 0.010948793031275272
Validation loss = 0.009543637745082378
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009728891775012016
Validation loss = 0.010759520344436169
Validation loss = 0.00937294028699398
Validation loss = 0.013266879133880138
Validation loss = 0.009507198818027973
Validation loss = 0.008886990137398243
Validation loss = 0.011514377780258656
Validation loss = 0.009139960631728172
Validation loss = 0.00868260208517313
Validation loss = 0.008960125036537647
Validation loss = 0.009143231436610222
Validation loss = 0.00875755213201046
Validation loss = 0.008930386044085026
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01042809896171093
Validation loss = 0.010465357452630997
Validation loss = 0.010305006988346577
Validation loss = 0.012703071348369122
Validation loss = 0.009953377768397331
Validation loss = 0.010468208231031895
Validation loss = 0.013608123175799847
Validation loss = 0.010830861516296864
Validation loss = 0.009929082356393337
Validation loss = 0.01004927046597004
Validation loss = 0.009865007363259792
Validation loss = 0.010856429114937782
Validation loss = 0.010445981286466122
Validation loss = 0.010270535945892334
Validation loss = 0.01104607991874218
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010385234840214252
Validation loss = 0.009664858691394329
Validation loss = 0.016230791807174683
Validation loss = 0.009699061512947083
Validation loss = 0.00941451732069254
Validation loss = 0.009883522987365723
Validation loss = 0.010202097706496716
Validation loss = 0.010633212514221668
Validation loss = 0.00941742118448019
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010453473776578903
Validation loss = 0.010299704037606716
Validation loss = 0.009902767837047577
Validation loss = 0.010280122049152851
Validation loss = 0.010165355168282986
Validation loss = 0.011477590538561344
Validation loss = 0.010473314672708511
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000987 |
| Iteration     | 63        |
| MaximumReturn | -0.000616 |
| MinimumReturn | -0.00136  |
| TotalSamples  | 108290    |
-----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010108299553394318
Validation loss = 0.010693304240703583
Validation loss = 0.010422294028103352
Validation loss = 0.010333508253097534
Validation loss = 0.01012422889471054
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008799891918897629
Validation loss = 0.010245566256344318
Validation loss = 0.009426121599972248
Validation loss = 0.009554428979754448
Validation loss = 0.00955161266028881
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01099464762955904
Validation loss = 0.01093613263219595
Validation loss = 0.009983335621654987
Validation loss = 0.010312754660844803
Validation loss = 0.012048978358507156
Validation loss = 0.010990899056196213
Validation loss = 0.010309696197509766
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009452788159251213
Validation loss = 0.012985605746507645
Validation loss = 0.011669902130961418
Validation loss = 0.00984531082212925
Validation loss = 0.010636313818395138
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011374603025615215
Validation loss = 0.009997643530368805
Validation loss = 0.010236154310405254
Validation loss = 0.01025601476430893
Validation loss = 0.009938187897205353
Validation loss = 0.00963594764471054
Validation loss = 0.01152475643903017
Validation loss = 0.00958569161593914
Validation loss = 0.009855103679001331
Validation loss = 0.00969954114407301
Validation loss = 0.009879572317004204
Validation loss = 0.011505677364766598
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00106 |
| Iteration     | 64       |
| MaximumReturn | -0.00073 |
| MinimumReturn | -0.00292 |
| TotalSamples  | 109956   |
----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009726500138640404
Validation loss = 0.011072053574025631
Validation loss = 0.010291996411979198
Validation loss = 0.010765478946268559
Validation loss = 0.011205715127289295
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009838382713496685
Validation loss = 0.010205501690506935
Validation loss = 0.009394031018018723
Validation loss = 0.00970036443322897
Validation loss = 0.009958445094525814
Validation loss = 0.010409670881927013
Validation loss = 0.009390363469719887
Validation loss = 0.009442059323191643
Validation loss = 0.010662351734936237
Validation loss = 0.010000014677643776
Validation loss = 0.009430870413780212
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01047466415911913
Validation loss = 0.010168816894292831
Validation loss = 0.010019036941230297
Validation loss = 0.010106976144015789
Validation loss = 0.010730346664786339
Validation loss = 0.01132017932832241
Validation loss = 0.010174952447414398
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01201976090669632
Validation loss = 0.009579476900398731
Validation loss = 0.009243136271834373
Validation loss = 0.01164672989398241
Validation loss = 0.011515396647155285
Validation loss = 0.00907287746667862
Validation loss = 0.008978434838354588
Validation loss = 0.011898930184543133
Validation loss = 0.009895723313093185
Validation loss = 0.009440964087843895
Validation loss = 0.009842398576438427
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011831910349428654
Validation loss = 0.011040654964745045
Validation loss = 0.010740607045590878
Validation loss = 0.009690753184258938
Validation loss = 0.009654788300395012
Validation loss = 0.01022421382367611
Validation loss = 0.010910576209425926
Validation loss = 0.011548303067684174
Validation loss = 0.010777611285448074
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00106  |
| Iteration     | 65        |
| MaximumReturn | -0.000635 |
| MinimumReturn | -0.00217  |
| TotalSamples  | 111622    |
-----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01096673309803009
Validation loss = 0.010241694748401642
Validation loss = 0.011206397786736488
Validation loss = 0.010102867148816586
Validation loss = 0.010002713650465012
Validation loss = 0.010293909348547459
Validation loss = 0.009900443255901337
Validation loss = 0.013046500273048878
Validation loss = 0.009871641173958778
Validation loss = 0.010413753800094128
Validation loss = 0.01198512502014637
Validation loss = 0.009974808432161808
Validation loss = 0.010771994479000568
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010266898199915886
Validation loss = 0.010963636450469494
Validation loss = 0.00933973491191864
Validation loss = 0.009164033457636833
Validation loss = 0.009734473191201687
Validation loss = 0.013325381092727184
Validation loss = 0.01032134611159563
Validation loss = 0.009600893594324589
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010786378756165504
Validation loss = 0.010451311245560646
Validation loss = 0.010190775617957115
Validation loss = 0.010453795082867146
Validation loss = 0.010718333534896374
Validation loss = 0.010626905597746372
Validation loss = 0.01145557127892971
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0096504557877779
Validation loss = 0.00977626908570528
Validation loss = 0.009745578281581402
Validation loss = 0.010004408657550812
Validation loss = 0.009906431660056114
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011741267517209053
Validation loss = 0.010694671422243118
Validation loss = 0.01056515984237194
Validation loss = 0.010336107574403286
Validation loss = 0.010543573647737503
Validation loss = 0.010413586162030697
Validation loss = 0.011106948368251324
Validation loss = 0.010051572695374489
Validation loss = 0.010402433574199677
Validation loss = 0.009942034259438515
Validation loss = 0.010969406925141811
Validation loss = 0.009979904629290104
Validation loss = 0.01123744249343872
Validation loss = 0.009987766854465008
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00116  |
| Iteration     | 66        |
| MaximumReturn | -0.000844 |
| MinimumReturn | -0.00179  |
| TotalSamples  | 113288    |
-----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011271046474575996
Validation loss = 0.010335060767829418
Validation loss = 0.01007892470806837
Validation loss = 0.012594178318977356
Validation loss = 0.010155798867344856
Validation loss = 0.010074843652546406
Validation loss = 0.010080011561512947
Validation loss = 0.0102304732427001
Validation loss = 0.010232960805296898
Validation loss = 0.011304489336907864
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009481419809162617
Validation loss = 0.009536090306937695
Validation loss = 0.00964448694139719
Validation loss = 0.009382317773997784
Validation loss = 0.010815588757395744
Validation loss = 0.008769319392740726
Validation loss = 0.009369063191115856
Validation loss = 0.00906766764819622
Validation loss = 0.009650038555264473
Validation loss = 0.009989107958972454
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013722949661314487
Validation loss = 0.013070298358798027
Validation loss = 0.01100204698741436
Validation loss = 0.010167022235691547
Validation loss = 0.011054499074816704
Validation loss = 0.009914522059261799
Validation loss = 0.010519585572183132
Validation loss = 0.01078814547508955
Validation loss = 0.010491104796528816
Validation loss = 0.010071797296404839
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009701983071863651
Validation loss = 0.009954571723937988
Validation loss = 0.011326620355248451
Validation loss = 0.01466196309775114
Validation loss = 0.00941513292491436
Validation loss = 0.009372672066092491
Validation loss = 0.009559853933751583
Validation loss = 0.009422327391803265
Validation loss = 0.010007308796048164
Validation loss = 0.010258163325488567
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010283063165843487
Validation loss = 0.010676831938326359
Validation loss = 0.01050508115440607
Validation loss = 0.010843378491699696
Validation loss = 0.009955466724932194
Validation loss = 0.010494709014892578
Validation loss = 0.010758691467344761
Validation loss = 0.012242830358445644
Validation loss = 0.01117621548473835
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00115  |
| Iteration     | 67        |
| MaximumReturn | -0.000591 |
| MinimumReturn | -0.00233  |
| TotalSamples  | 114954    |
-----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010218174196779728
Validation loss = 0.011155498214066029
Validation loss = 0.009833703748881817
Validation loss = 0.010426643304526806
Validation loss = 0.010147351771593094
Validation loss = 0.010119897313416004
Validation loss = 0.00970521755516529
Validation loss = 0.009975200518965721
Validation loss = 0.009668434038758278
Validation loss = 0.01005795318633318
Validation loss = 0.009740105830132961
Validation loss = 0.010020229034125805
Validation loss = 0.010040468536317348
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010073109529912472
Validation loss = 0.009752199985086918
Validation loss = 0.009574338793754578
Validation loss = 0.009682245552539825
Validation loss = 0.009592326357960701
Validation loss = 0.010389872826635838
Validation loss = 0.009292343631386757
Validation loss = 0.010064534842967987
Validation loss = 0.009103572927415371
Validation loss = 0.012331997975707054
Validation loss = 0.010627157986164093
Validation loss = 0.010022765956819057
Validation loss = 0.010695709846913815
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010064644739031792
Validation loss = 0.010237286798655987
Validation loss = 0.012437515892088413
Validation loss = 0.010699672624468803
Validation loss = 0.010012221522629261
Validation loss = 0.010068479925394058
Validation loss = 0.009962747804820538
Validation loss = 0.011245685629546642
Validation loss = 0.010893801227211952
Validation loss = 0.010042645037174225
Validation loss = 0.010699506849050522
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011049765162169933
Validation loss = 0.009945299476385117
Validation loss = 0.00965585745871067
Validation loss = 0.009573031216859818
Validation loss = 0.009563158266246319
Validation loss = 0.010671520605683327
Validation loss = 0.010105090215802193
Validation loss = 0.00996820256114006
Validation loss = 0.010337915271520615
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011378400027751923
Validation loss = 0.011161848902702332
Validation loss = 0.011150255799293518
Validation loss = 0.013382920995354652
Validation loss = 0.011578099802136421
Validation loss = 0.011514666490256786
Validation loss = 0.011167673394083977
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00095  |
| Iteration     | 68        |
| MaximumReturn | -0.000672 |
| MinimumReturn | -0.0012   |
| TotalSamples  | 116620    |
-----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009789576753973961
Validation loss = 0.0103308679535985
Validation loss = 0.010580052621662617
Validation loss = 0.01005928311496973
Validation loss = 0.010219093412160873
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009701834991574287
Validation loss = 0.011285074055194855
Validation loss = 0.009438220411539078
Validation loss = 0.009333212859928608
Validation loss = 0.009498821571469307
Validation loss = 0.010364364832639694
Validation loss = 0.010168700478971004
Validation loss = 0.009390161372721195
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01066537480801344
Validation loss = 0.012255631387233734
Validation loss = 0.01033942960202694
Validation loss = 0.009761807508766651
Validation loss = 0.01049042772501707
Validation loss = 0.010983136482536793
Validation loss = 0.00984299834817648
Validation loss = 0.010063624009490013
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009864718653261662
Validation loss = 0.009752152487635612
Validation loss = 0.010038061067461967
Validation loss = 0.009842727333307266
Validation loss = 0.009932569228112698
Validation loss = 0.009626735001802444
Validation loss = 0.009272142313420773
Validation loss = 0.009633132256567478
Validation loss = 0.009485323913395405
Validation loss = 0.010781965218484402
Validation loss = 0.009701368398964405
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011994649656116962
Validation loss = 0.010758096352219582
Validation loss = 0.012314261868596077
Validation loss = 0.0110786659643054
Validation loss = 0.010786413215100765
Validation loss = 0.01229217741638422
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00102  |
| Iteration     | 69        |
| MaximumReturn | -0.000765 |
| MinimumReturn | -0.0015   |
| TotalSamples  | 118286    |
-----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01085380557924509
Validation loss = 0.011652805842459202
Validation loss = 0.009928068146109581
Validation loss = 0.010242882184684277
Validation loss = 0.010005198419094086
Validation loss = 0.010317646898329258
Validation loss = 0.010631522163748741
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009995689615607262
Validation loss = 0.009404998272657394
Validation loss = 0.010420848615467548
Validation loss = 0.009151725098490715
Validation loss = 0.012654687277972698
Validation loss = 0.009482483379542828
Validation loss = 0.00962175615131855
Validation loss = 0.010676538571715355
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011738015338778496
Validation loss = 0.010755213908851147
Validation loss = 0.010430623777210712
Validation loss = 0.010058537125587463
Validation loss = 0.010107899084687233
Validation loss = 0.009693922474980354
Validation loss = 0.01010865904390812
Validation loss = 0.011217787861824036
Validation loss = 0.01073739305138588
Validation loss = 0.010451901704072952
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011236595921218395
Validation loss = 0.010445084422826767
Validation loss = 0.009579699486494064
Validation loss = 0.010309718549251556
Validation loss = 0.010117663070559502
Validation loss = 0.009762105531990528
Validation loss = 0.009540132246911526
Validation loss = 0.009609066881239414
Validation loss = 0.00952395424246788
Validation loss = 0.010447193868458271
Validation loss = 0.013090014457702637
Validation loss = 0.009065795689821243
Validation loss = 0.009943339973688126
Validation loss = 0.00954118650406599
Validation loss = 0.00945885106921196
Validation loss = 0.011445707641541958
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013272002339363098
Validation loss = 0.01060351263731718
Validation loss = 0.010835347697138786
Validation loss = 0.010882541537284851
Validation loss = 0.010451565496623516
Validation loss = 0.01117505319416523
Validation loss = 0.010326679795980453
Validation loss = 0.01017787680029869
Validation loss = 0.010526370257139206
Validation loss = 0.010595808736979961
Validation loss = 0.011074144393205643
Validation loss = 0.010357036255300045
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00102  |
| Iteration     | 70        |
| MaximumReturn | -0.000627 |
| MinimumReturn | -0.00176  |
| TotalSamples  | 119952    |
-----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011486965231597424
Validation loss = 0.010463256388902664
Validation loss = 0.01089547760784626
Validation loss = 0.010259914211928844
Validation loss = 0.010274941101670265
Validation loss = 0.010371745564043522
Validation loss = 0.009872407652437687
Validation loss = 0.010390926152467728
Validation loss = 0.010176566429436207
Validation loss = 0.011495236307382584
Validation loss = 0.009988699108362198
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010956582613289356
Validation loss = 0.010011903010308743
Validation loss = 0.009566709399223328
Validation loss = 0.009198997169733047
Validation loss = 0.009627568535506725
Validation loss = 0.010598733089864254
Validation loss = 0.009397834539413452
Validation loss = 0.009965155273675919
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010008266195654869
Validation loss = 0.010399376042187214
Validation loss = 0.01048070378601551
Validation loss = 0.010153609327971935
Validation loss = 0.009899715892970562
Validation loss = 0.009832511655986309
Validation loss = 0.010697360150516033
Validation loss = 0.010317789390683174
Validation loss = 0.010683777742087841
Validation loss = 0.010285037569701672
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010082579217851162
Validation loss = 0.009638671763241291
Validation loss = 0.010275988839566708
Validation loss = 0.010046947747468948
Validation loss = 0.010152214206755161
Validation loss = 0.009682257659733295
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010428551584482193
Validation loss = 0.01088362280279398
Validation loss = 0.010636144317686558
Validation loss = 0.010035206563770771
Validation loss = 0.012146908789873123
Validation loss = 0.009971754625439644
Validation loss = 0.011031554080545902
Validation loss = 0.012320911511778831
Validation loss = 0.010406888090074062
Validation loss = 0.010015049949288368
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000933 |
| Iteration     | 71        |
| MaximumReturn | -0.000698 |
| MinimumReturn | -0.00135  |
| TotalSamples  | 121618    |
-----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010292191058397293
Validation loss = 0.010272365063428879
Validation loss = 0.010457873344421387
Validation loss = 0.010886535979807377
Validation loss = 0.010521596297621727
Validation loss = 0.010406557470560074
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00939061027020216
Validation loss = 0.010068690404295921
Validation loss = 0.009773826226592064
Validation loss = 0.009937245398759842
Validation loss = 0.010841447860002518
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009517211467027664
Validation loss = 0.010167399421334267
Validation loss = 0.009688789024949074
Validation loss = 0.009898809716105461
Validation loss = 0.010111230425536633
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010244806297123432
Validation loss = 0.009534498676657677
Validation loss = 0.009750650264322758
Validation loss = 0.01157665066421032
Validation loss = 0.01141034159809351
Validation loss = 0.01048758253455162
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010753189213573933
Validation loss = 0.0115949222818017
Validation loss = 0.010051548480987549
Validation loss = 0.009939875453710556
Validation loss = 0.011003437452018261
Validation loss = 0.010463913902640343
Validation loss = 0.010507268831133842
Validation loss = 0.010639801621437073
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0022  |
| Iteration     | 72       |
| MaximumReturn | -0.00167 |
| MinimumReturn | -0.0032  |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010280760005116463
Validation loss = 0.00981562677770853
Validation loss = 0.009615462273359299
Validation loss = 0.011726204305887222
Validation loss = 0.010253111831843853
Validation loss = 0.009719050489366055
Validation loss = 0.010874387808144093
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009940402582287788
Validation loss = 0.00921204499900341
Validation loss = 0.010627355426549911
Validation loss = 0.009472029283642769
Validation loss = 0.009340262971818447
Validation loss = 0.010500782169401646
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010372347198426723
Validation loss = 0.009811865165829659
Validation loss = 0.010763179510831833
Validation loss = 0.009842896834015846
Validation loss = 0.011578023433685303
Validation loss = 0.010612757876515388
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011619641445577145
Validation loss = 0.010736909694969654
Validation loss = 0.01018537674099207
Validation loss = 0.009700930677354336
Validation loss = 0.00996365025639534
Validation loss = 0.010151460766792297
Validation loss = 0.010274374857544899
Validation loss = 0.010283895768225193
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010397731326520443
Validation loss = 0.010706886649131775
Validation loss = 0.010662419721484184
Validation loss = 0.01143391989171505
Validation loss = 0.01064036600291729
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00112 |
| Iteration     | 73       |
| MaximumReturn | -0.00072 |
| MinimumReturn | -0.00181 |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01088032964617014
Validation loss = 0.010971909388899803
Validation loss = 0.01191350445151329
Validation loss = 0.011866110377013683
Validation loss = 0.010090762749314308
Validation loss = 0.010941463522613049
Validation loss = 0.009288419969379902
Validation loss = 0.010385928675532341
Validation loss = 0.00999899860471487
Validation loss = 0.009989955462515354
Validation loss = 0.009968897327780724
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010058562271296978
Validation loss = 0.010218123905360699
Validation loss = 0.010852429084479809
Validation loss = 0.010009510442614555
Validation loss = 0.009574814699590206
Validation loss = 0.009898717515170574
Validation loss = 0.011436330154538155
Validation loss = 0.009168446063995361
Validation loss = 0.009036839939653873
Validation loss = 0.010594876483082771
Validation loss = 0.009310259483754635
Validation loss = 0.009784073568880558
Validation loss = 0.010540383867919445
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01048099622130394
Validation loss = 0.011874707415699959
Validation loss = 0.009940149262547493
Validation loss = 0.010876997373998165
Validation loss = 0.010138950310647488
Validation loss = 0.010466749779880047
Validation loss = 0.010365568101406097
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010433955118060112
Validation loss = 0.009829455986618996
Validation loss = 0.014089769683778286
Validation loss = 0.009609755128622055
Validation loss = 0.009756197221577168
Validation loss = 0.009627639316022396
Validation loss = 0.009707577526569366
Validation loss = 0.009529588744044304
Validation loss = 0.010170943103730679
Validation loss = 0.009918020106852055
Validation loss = 0.011689074337482452
Validation loss = 0.010181711986660957
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011430359445512295
Validation loss = 0.010381276719272137
Validation loss = 0.010309312492609024
Validation loss = 0.01064294297248125
Validation loss = 0.015595466829836369
Validation loss = 0.011268953792750835
Validation loss = 0.011280633509159088
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00115  |
| Iteration     | 74        |
| MaximumReturn | -0.000706 |
| MinimumReturn | -0.00181  |
| TotalSamples  | 126616    |
-----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009964341297745705
Validation loss = 0.010255591943860054
Validation loss = 0.010210542008280754
Validation loss = 0.010829878970980644
Validation loss = 0.01167239248752594
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009308668784797192
Validation loss = 0.009462513029575348
Validation loss = 0.009778043255209923
Validation loss = 0.0112087931483984
Validation loss = 0.009368964470922947
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010044996626675129
Validation loss = 0.010581118986010551
Validation loss = 0.010796616785228252
Validation loss = 0.010052433237433434
Validation loss = 0.0100238136947155
Validation loss = 0.010130845010280609
Validation loss = 0.01142340712249279
Validation loss = 0.010792335495352745
Validation loss = 0.010184061713516712
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009801788255572319
Validation loss = 0.009641644544899464
Validation loss = 0.010061103850603104
Validation loss = 0.009648621082305908
Validation loss = 0.010235209949314594
Validation loss = 0.010061953216791153
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012872083112597466
Validation loss = 0.01288133580237627
Validation loss = 0.010674509219825268
Validation loss = 0.01070247683674097
Validation loss = 0.012175656855106354
Validation loss = 0.010198579169809818
Validation loss = 0.013264928944408894
Validation loss = 0.0127655528485775
Validation loss = 0.010435914620757103
Validation loss = 0.010584796778857708
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00107  |
| Iteration     | 75        |
| MaximumReturn | -0.000763 |
| MinimumReturn | -0.00168  |
| TotalSamples  | 128282    |
-----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009860465303063393
Validation loss = 0.010347402654588223
Validation loss = 0.012449913658201694
Validation loss = 0.011404367163777351
Validation loss = 0.01003924198448658
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009442521259188652
Validation loss = 0.010222548618912697
Validation loss = 0.010149477049708366
Validation loss = 0.009074552915990353
Validation loss = 0.009486510418355465
Validation loss = 0.009640835225582123
Validation loss = 0.00970475934445858
Validation loss = 0.01116953045129776
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010434933006763458
Validation loss = 0.010144165717065334
Validation loss = 0.010409208945930004
Validation loss = 0.010394337587058544
Validation loss = 0.010024415329098701
Validation loss = 0.010541603900492191
Validation loss = 0.01059648022055626
Validation loss = 0.009939304552972317
Validation loss = 0.010063815861940384
Validation loss = 0.010228855535387993
Validation loss = 0.010233189910650253
Validation loss = 0.010238256305456161
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009883740916848183
Validation loss = 0.009766495786607265
Validation loss = 0.01011741068214178
Validation loss = 0.010155094787478447
Validation loss = 0.010353228077292442
Validation loss = 0.0099038016051054
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009607761166989803
Validation loss = 0.010226495563983917
Validation loss = 0.010200928896665573
Validation loss = 0.015363196842372417
Validation loss = 0.011372421868145466
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00114  |
| Iteration     | 76        |
| MaximumReturn | -0.000829 |
| MinimumReturn | -0.0018   |
| TotalSamples  | 129948    |
-----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009919200092554092
Validation loss = 0.010982112027704716
Validation loss = 0.012189052999019623
Validation loss = 0.009804515168070793
Validation loss = 0.011319328099489212
Validation loss = 0.010431413538753986
Validation loss = 0.010036990977823734
Validation loss = 0.018762381747364998
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009723290801048279
Validation loss = 0.009449644014239311
Validation loss = 0.016872666776180267
Validation loss = 0.009715981781482697
Validation loss = 0.009449990466237068
Validation loss = 0.010561255738139153
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010082939639687538
Validation loss = 0.009920324198901653
Validation loss = 0.010241379030048847
Validation loss = 0.010924625210464
Validation loss = 0.010830632410943508
Validation loss = 0.010566109791398048
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010033778846263885
Validation loss = 0.0099591463804245
Validation loss = 0.009828638285398483
Validation loss = 0.00985591672360897
Validation loss = 0.010390075854957104
Validation loss = 0.010022401809692383
Validation loss = 0.010168079286813736
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011123919859528542
Validation loss = 0.011410344392061234
Validation loss = 0.010847003199160099
Validation loss = 0.011961178854107857
Validation loss = 0.01305184606462717
Validation loss = 0.010678228922188282
Validation loss = 0.011655546724796295
Validation loss = 0.010255666449666023
Validation loss = 0.010668872855603695
Validation loss = 0.010456669144332409
Validation loss = 0.009797186590731144
Validation loss = 0.011009195819497108
Validation loss = 0.011108089238405228
Validation loss = 0.010077839717268944
Validation loss = 0.010820180177688599
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00133  |
| Iteration     | 77        |
| MaximumReturn | -0.000888 |
| MinimumReturn | -0.0022   |
| TotalSamples  | 131614    |
-----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01074405200779438
Validation loss = 0.01020786538720131
Validation loss = 0.0101298363879323
Validation loss = 0.009925113059580326
Validation loss = 0.010754559189081192
Validation loss = 0.010276407934725285
Validation loss = 0.01062476821243763
Validation loss = 0.010093876160681248
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010704675689339638
Validation loss = 0.009579531848430634
Validation loss = 0.00969095341861248
Validation loss = 0.00965186022222042
Validation loss = 0.009654685854911804
Validation loss = 0.009897916577756405
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010316349565982819
Validation loss = 0.01206751074641943
Validation loss = 0.010291000828146935
Validation loss = 0.010863948613405228
Validation loss = 0.010031212121248245
Validation loss = 0.010217899456620216
Validation loss = 0.014340942725539207
Validation loss = 0.010594917461276054
Validation loss = 0.010597633197903633
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01074381172657013
Validation loss = 0.01166319940239191
Validation loss = 0.012924278154969215
Validation loss = 0.010087224654853344
Validation loss = 0.009787502698600292
Validation loss = 0.01007269136607647
Validation loss = 0.00997841265052557
Validation loss = 0.010672042146325111
Validation loss = 0.011102195829153061
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010676244273781776
Validation loss = 0.010876188986003399
Validation loss = 0.020103750750422478
Validation loss = 0.010693123564124107
Validation loss = 0.009594789706170559
Validation loss = 0.010701498948037624
Validation loss = 0.009997059591114521
Validation loss = 0.01137461792677641
Validation loss = 0.009900477714836597
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00106  |
| Iteration     | 78        |
| MaximumReturn | -0.000571 |
| MinimumReturn | -0.00159  |
| TotalSamples  | 133280    |
-----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010619904845952988
Validation loss = 0.010166004300117493
Validation loss = 0.009785115718841553
Validation loss = 0.009667460806667805
Validation loss = 0.0095224529504776
Validation loss = 0.009790364652872086
Validation loss = 0.010213309898972511
Validation loss = 0.010010848753154278
Validation loss = 0.010213463567197323
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010834741406142712
Validation loss = 0.009577130898833275
Validation loss = 0.010115040466189384
Validation loss = 0.00938392337411642
Validation loss = 0.009146139025688171
Validation loss = 0.010602287016808987
Validation loss = 0.010293468832969666
Validation loss = 0.009993682615458965
Validation loss = 0.009471225552260876
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010472733527421951
Validation loss = 0.010836025699973106
Validation loss = 0.01043190248310566
Validation loss = 0.01017947867512703
Validation loss = 0.010369151830673218
Validation loss = 0.010361013002693653
Validation loss = 0.010493387468159199
Validation loss = 0.01154472678899765
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010894996114075184
Validation loss = 0.010403946042060852
Validation loss = 0.010949318297207355
Validation loss = 0.010947877541184425
Validation loss = 0.01127123273909092
Validation loss = 0.010706856846809387
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010672228410840034
Validation loss = 0.01160452701151371
Validation loss = 0.010462621226906776
Validation loss = 0.010940100997686386
Validation loss = 0.010325661860406399
Validation loss = 0.011152633465826511
Validation loss = 0.010217334143817425
Validation loss = 0.010028481483459473
Validation loss = 0.009957395493984222
Validation loss = 0.010393195785582066
Validation loss = 0.010046317242085934
Validation loss = 0.010315902531147003
Validation loss = 0.010391028597950935
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00107  |
| Iteration     | 79        |
| MaximumReturn | -0.000832 |
| MinimumReturn | -0.00161  |
| TotalSamples  | 134946    |
-----------------------------
