Logging to experiments/invertedPendulum/nov1/w350e3_seed2431
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7400563359260559
Validation loss = 0.3818690776824951
Validation loss = 0.3386020064353943
Validation loss = 0.32678189873695374
Validation loss = 0.3212313950061798
Validation loss = 0.29936453700065613
Validation loss = 0.27863869071006775
Validation loss = 0.2602151036262512
Validation loss = 0.24548622965812683
Validation loss = 0.240572988986969
Validation loss = 0.23840896785259247
Validation loss = 0.21743811666965485
Validation loss = 0.23754888772964478
Validation loss = 0.2120107263326645
Validation loss = 0.19267414510250092
Validation loss = 0.18505990505218506
Validation loss = 0.20334626734256744
Validation loss = 0.1741473376750946
Validation loss = 0.19298583269119263
Validation loss = 0.19558505713939667
Validation loss = 0.17419712245464325
Validation loss = 0.16572634875774384
Validation loss = 0.15999063849449158
Validation loss = 0.16252432763576508
Validation loss = 0.16587211191654205
Validation loss = 0.16242150962352753
Validation loss = 0.14088304340839386
Validation loss = 0.14913058280944824
Validation loss = 0.13328760862350464
Validation loss = 0.12766806781291962
Validation loss = 0.13357220590114594
Validation loss = 0.14532361924648285
Validation loss = 0.12931440770626068
Validation loss = 0.11504975706338882
Validation loss = 0.1276005208492279
Validation loss = 0.11388301104307175
Validation loss = 0.09592965245246887
Validation loss = 0.1085750088095665
Validation loss = 0.0964120551943779
Validation loss = 0.09040578454732895
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.745286226272583
Validation loss = 0.39452385902404785
Validation loss = 0.3506927192211151
Validation loss = 0.32295793294906616
Validation loss = 0.31047624349594116
Validation loss = 0.2903604805469513
Validation loss = 0.2769559919834137
Validation loss = 0.26069870591163635
Validation loss = 0.24866236746311188
Validation loss = 0.23470737040042877
Validation loss = 0.23148661851882935
Validation loss = 0.21510766446590424
Validation loss = 0.21453642845153809
Validation loss = 0.21052131056785583
Validation loss = 0.19762587547302246
Validation loss = 0.19091321527957916
Validation loss = 0.18090443313121796
Validation loss = 0.17643193900585175
Validation loss = 0.1968630999326706
Validation loss = 0.16557180881500244
Validation loss = 0.14769546687602997
Validation loss = 0.14040151238441467
Validation loss = 0.1506335437297821
Validation loss = 0.12917619943618774
Validation loss = 0.14005449414253235
Validation loss = 0.13958528637886047
Validation loss = 0.14084619283676147
Validation loss = 0.1197827085852623
Validation loss = 0.1091921254992485
Validation loss = 0.10750427842140198
Validation loss = 0.10051940381526947
Validation loss = 0.09093856811523438
Validation loss = 0.10864277929067612
Validation loss = 0.09516473859548569
Validation loss = 0.09164642542600632
Validation loss = 0.09408172219991684
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7389129996299744
Validation loss = 0.44738635420799255
Validation loss = 0.3815486431121826
Validation loss = 0.3384726941585541
Validation loss = 0.32249754667282104
Validation loss = 0.3247546851634979
Validation loss = 0.3105899691581726
Validation loss = 0.28118762373924255
Validation loss = 0.26978328824043274
Validation loss = 0.26762035489082336
Validation loss = 0.24237918853759766
Validation loss = 0.22599969804286957
Validation loss = 0.2277594357728958
Validation loss = 0.21817688643932343
Validation loss = 0.2103220671415329
Validation loss = 0.20670512318611145
Validation loss = 0.20783939957618713
Validation loss = 0.19704197347164154
Validation loss = 0.1903424710035324
Validation loss = 0.2090776264667511
Validation loss = 0.2029835730791092
Validation loss = 0.19167464971542358
Validation loss = 0.16685408353805542
Validation loss = 0.1590258926153183
Validation loss = 0.15505769848823547
Validation loss = 0.17098233103752136
Validation loss = 0.16261710226535797
Validation loss = 0.16206619143486023
Validation loss = 0.15291431546211243
Validation loss = 0.13578365743160248
Validation loss = 0.14254704117774963
Validation loss = 0.1568114459514618
Validation loss = 0.1256871521472931
Validation loss = 0.1328532099723816
Validation loss = 0.11636515706777573
Validation loss = 0.10181217640638351
Validation loss = 0.10440083593130112
Validation loss = 0.10889706760644913
Validation loss = 0.09926991909742355
Validation loss = 0.08889733254909515
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7367522716522217
Validation loss = 0.3773338496685028
Validation loss = 0.35945478081703186
Validation loss = 0.34062835574150085
Validation loss = 0.3110189139842987
Validation loss = 0.30998340249061584
Validation loss = 0.2783084511756897
Validation loss = 0.26486659049987793
Validation loss = 0.25961077213287354
Validation loss = 0.24051202833652496
Validation loss = 0.2358173280954361
Validation loss = 0.22569523751735687
Validation loss = 0.21541014313697815
Validation loss = 0.2068920135498047
Validation loss = 0.20173020660877228
Validation loss = 0.2076224982738495
Validation loss = 0.20437254011631012
Validation loss = 0.18969085812568665
Validation loss = 0.1838475912809372
Validation loss = 0.18828162550926208
Validation loss = 0.18537795543670654
Validation loss = 0.18979023396968842
Validation loss = 0.17479462921619415
Validation loss = 0.15494093298912048
Validation loss = 0.14512208104133606
Validation loss = 0.12779706716537476
Validation loss = 0.12684203684329987
Validation loss = 0.11186566203832626
Validation loss = 0.13177898526191711
Validation loss = 0.11707454174757004
Validation loss = 0.10572340339422226
Validation loss = 0.10549569875001907
Validation loss = 0.10631509870290756
Validation loss = 0.10810061544179916
Validation loss = 0.09298155456781387
Validation loss = 0.0986742377281189
Validation loss = 0.1092550978064537
Validation loss = 0.11875995248556137
Validation loss = 0.10643939673900604
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7722506523132324
Validation loss = 0.41088253259658813
Validation loss = 0.3480004072189331
Validation loss = 0.3234347999095917
Validation loss = 0.31589800119400024
Validation loss = 0.2966455817222595
Validation loss = 0.27872583270072937
Validation loss = 0.2635500729084015
Validation loss = 0.24440571665763855
Validation loss = 0.2321372628211975
Validation loss = 0.22370953857898712
Validation loss = 0.22945713996887207
Validation loss = 0.2127922624349594
Validation loss = 0.21115563809871674
Validation loss = 0.2136998027563095
Validation loss = 0.20674584805965424
Validation loss = 0.2058628350496292
Validation loss = 0.19665227830410004
Validation loss = 0.18962660431861877
Validation loss = 0.16257427632808685
Validation loss = 0.15274712443351746
Validation loss = 0.1557087004184723
Validation loss = 0.15235117077827454
Validation loss = 0.14043723046779633
Validation loss = 0.1309591829776764
Validation loss = 0.13666179776191711
Validation loss = 0.1196780651807785
Validation loss = 0.11684161424636841
Validation loss = 0.12021363526582718
Validation loss = 0.1273387223482132
Validation loss = 0.11187796294689178
Validation loss = 0.10845176875591278
Validation loss = 0.10815233737230301
Validation loss = 0.10503021627664566
Validation loss = 0.1243458092212677
Validation loss = 0.11436857283115387
Validation loss = 0.09596369415521622
Validation loss = 0.0981854572892189
Validation loss = 0.09858735650777817
Validation loss = 0.10595215857028961
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0435  |
| Iteration     | 0        |
| MaximumReturn | -0.0271  |
| MinimumReturn | -0.0886  |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.30769091844558716
Validation loss = 0.148160919547081
Validation loss = 0.11591564118862152
Validation loss = 0.10175447911024094
Validation loss = 0.09637518227100372
Validation loss = 0.09268228709697723
Validation loss = 0.08132120966911316
Validation loss = 0.08633030951023102
Validation loss = 0.07871340215206146
Validation loss = 0.07740087807178497
Validation loss = 0.06517177820205688
Validation loss = 0.06453476846218109
Validation loss = 0.06208342686295509
Validation loss = 0.07018717378377914
Validation loss = 0.06277110427618027
Validation loss = 0.06309773772954941
Validation loss = 0.06176936998963356
Validation loss = 0.06148176267743111
Validation loss = 0.05972937121987343
Validation loss = 0.06261896342039108
Validation loss = 0.05698010325431824
Validation loss = 0.05479903519153595
Validation loss = 0.049039989709854126
Validation loss = 0.05095390975475311
Validation loss = 0.0692523941397667
Validation loss = 0.065949447453022
Validation loss = 0.06629317253828049
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.2872503101825714
Validation loss = 0.14076018333435059
Validation loss = 0.1055605486035347
Validation loss = 0.09067783504724503
Validation loss = 0.09071353077888489
Validation loss = 0.08090031892061234
Validation loss = 0.07615383714437485
Validation loss = 0.07499504089355469
Validation loss = 0.07823614031076431
Validation loss = 0.06980734318494797
Validation loss = 0.06627465784549713
Validation loss = 0.07564984261989594
Validation loss = 0.056608010083436966
Validation loss = 0.06326855719089508
Validation loss = 0.05694112926721573
Validation loss = 0.0628686472773552
Validation loss = 0.05875416100025177
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3290022015571594
Validation loss = 0.16470612585544586
Validation loss = 0.1411014199256897
Validation loss = 0.12011303752660751
Validation loss = 0.10897378623485565
Validation loss = 0.10595224052667618
Validation loss = 0.09715203195810318
Validation loss = 0.09101487696170807
Validation loss = 0.08234985917806625
Validation loss = 0.081931471824646
Validation loss = 0.08119909465312958
Validation loss = 0.08143353462219238
Validation loss = 0.07047655433416367
Validation loss = 0.06840814650058746
Validation loss = 0.06991458684206009
Validation loss = 0.07499328255653381
Validation loss = 0.05778409540653229
Validation loss = 0.07262180745601654
Validation loss = 0.05946613475680351
Validation loss = 0.0703626200556755
Validation loss = 0.06555335968732834
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2836684286594391
Validation loss = 0.12794221937656403
Validation loss = 0.09561565518379211
Validation loss = 0.08536691218614578
Validation loss = 0.07896333932876587
Validation loss = 0.07073631882667542
Validation loss = 0.06981212645769119
Validation loss = 0.06573223322629929
Validation loss = 0.060761481523513794
Validation loss = 0.06269253045320511
Validation loss = 0.05721477419137955
Validation loss = 0.05948584899306297
Validation loss = 0.0554451048374176
Validation loss = 0.05787549540400505
Validation loss = 0.062291838228702545
Validation loss = 0.059617798775434494
Validation loss = 0.052236177027225494
Validation loss = 0.057012755423784256
Validation loss = 0.05686088651418686
Validation loss = 0.05864659696817398
Validation loss = 0.05498165264725685
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3149992823600769
Validation loss = 0.16277095675468445
Validation loss = 0.13628582656383514
Validation loss = 0.1273108571767807
Validation loss = 0.10421915352344513
Validation loss = 0.09381117671728134
Validation loss = 0.09504054486751556
Validation loss = 0.0998455360531807
Validation loss = 0.081741563975811
Validation loss = 0.07922844588756561
Validation loss = 0.07186485826969147
Validation loss = 0.0704481303691864
Validation loss = 0.07306389510631561
Validation loss = 0.06465378403663635
Validation loss = 0.06524144858121872
Validation loss = 0.07292788475751877
Validation loss = 0.05728362128138542
Validation loss = 0.05891166999936104
Validation loss = 0.05468396842479706
Validation loss = 0.05967016518115997
Validation loss = 0.052689578384160995
Validation loss = 0.05193595588207245
Validation loss = 0.05797365680336952
Validation loss = 0.05123763531446457
Validation loss = 0.05623779073357582
Validation loss = 0.06155670806765556
Validation loss = 0.06056980788707733
Validation loss = 0.05496992543339729
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00127  |
| Iteration     | 1         |
| MaximumReturn | -0.000853 |
| MinimumReturn | -0.00156  |
| TotalSamples  | 4998      |
-----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11087553948163986
Validation loss = 0.07929062098264694
Validation loss = 0.07009192556142807
Validation loss = 0.05911655351519585
Validation loss = 0.050815291702747345
Validation loss = 0.05268058553338051
Validation loss = 0.056863199919462204
Validation loss = 0.046358879655599594
Validation loss = 0.049048397690057755
Validation loss = 0.043723855167627335
Validation loss = 0.04205325245857239
Validation loss = 0.03821950405836105
Validation loss = 0.04269443824887276
Validation loss = 0.04754534363746643
Validation loss = 0.04436274245381355
Validation loss = 0.03694292902946472
Validation loss = 0.03509410470724106
Validation loss = 0.03798285871744156
Validation loss = 0.039343178272247314
Validation loss = 0.04646722972393036
Validation loss = 0.04675867781043053
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09940096735954285
Validation loss = 0.06438548862934113
Validation loss = 0.053695376962423325
Validation loss = 0.05684460699558258
Validation loss = 0.044209808111190796
Validation loss = 0.05207401141524315
Validation loss = 0.0489807054400444
Validation loss = 0.04396368935704231
Validation loss = 0.04668520763516426
Validation loss = 0.04279808700084686
Validation loss = 0.045912228524684906
Validation loss = 0.0542341023683548
Validation loss = 0.04077567905187607
Validation loss = 0.03863132745027542
Validation loss = 0.04044659063220024
Validation loss = 0.04412977397441864
Validation loss = 0.04409908503293991
Validation loss = 0.04087932035326958
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.09979903697967529
Validation loss = 0.07018880546092987
Validation loss = 0.07089845091104507
Validation loss = 0.060566067695617676
Validation loss = 0.05849441513419151
Validation loss = 0.05285107344388962
Validation loss = 0.05100550875067711
Validation loss = 0.05195838212966919
Validation loss = 0.04937967658042908
Validation loss = 0.047009725123643875
Validation loss = 0.05016537383198738
Validation loss = 0.04500903561711311
Validation loss = 0.047328390181064606
Validation loss = 0.0492766797542572
Validation loss = 0.05416915565729141
Validation loss = 0.045358262956142426
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09838483482599258
Validation loss = 0.05727556347846985
Validation loss = 0.05474628508090973
Validation loss = 0.05692548677325249
Validation loss = 0.048728033900260925
Validation loss = 0.049460895359516144
Validation loss = 0.04830186814069748
Validation loss = 0.040231771767139435
Validation loss = 0.0468563437461853
Validation loss = 0.05290456488728523
Validation loss = 0.04869157820940018
Validation loss = 0.04196421429514885
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08879664540290833
Validation loss = 0.05497867614030838
Validation loss = 0.045302532613277435
Validation loss = 0.04615260288119316
Validation loss = 0.047948017716407776
Validation loss = 0.04335770010948181
Validation loss = 0.04865992069244385
Validation loss = 0.04424164444208145
Validation loss = 0.049599789083004
Validation loss = 0.04693526774644852
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000847 |
| Iteration     | 2         |
| MaximumReturn | -0.000511 |
| MinimumReturn | -0.00126  |
| TotalSamples  | 6664      |
-----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06602223217487335
Validation loss = 0.03857323154807091
Validation loss = 0.028798803687095642
Validation loss = 0.039180003106594086
Validation loss = 0.029955759644508362
Validation loss = 0.034116726368665695
Validation loss = 0.0260471198707819
Validation loss = 0.02899700403213501
Validation loss = 0.025340089574456215
Validation loss = 0.02419360727071762
Validation loss = 0.026063119992613792
Validation loss = 0.03502921387553215
Validation loss = 0.02641328237950802
Validation loss = 0.0303693488240242
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07097671926021576
Validation loss = 0.044464293867349625
Validation loss = 0.03897402063012123
Validation loss = 0.03607097640633583
Validation loss = 0.034796446561813354
Validation loss = 0.03067520260810852
Validation loss = 0.030406437814235687
Validation loss = 0.03572121262550354
Validation loss = 0.028656871989369392
Validation loss = 0.03137192130088806
Validation loss = 0.03204206004738808
Validation loss = 0.03237740695476532
Validation loss = 0.03571828827261925
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.058231648057699203
Validation loss = 0.037044353783130646
Validation loss = 0.03620287775993347
Validation loss = 0.030761661008000374
Validation loss = 0.034102439880371094
Validation loss = 0.03261031210422516
Validation loss = 0.030831756070256233
Validation loss = 0.029416820034384727
Validation loss = 0.031616583466529846
Validation loss = 0.03548186272382736
Validation loss = 0.032373975962400436
Validation loss = 0.028525754809379578
Validation loss = 0.031028447672724724
Validation loss = 0.031916916370391846
Validation loss = 0.030096061527729034
Validation loss = 0.030172569677233696
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05364793911576271
Validation loss = 0.038858573883771896
Validation loss = 0.029710449278354645
Validation loss = 0.029652975499629974
Validation loss = 0.03393559902906418
Validation loss = 0.033073026686906815
Validation loss = 0.028845027089118958
Validation loss = 0.030766313895583153
Validation loss = 0.029180139303207397
Validation loss = 0.027545107528567314
Validation loss = 0.04067109897732735
Validation loss = 0.030730515718460083
Validation loss = 0.03246570751070976
Validation loss = 0.03343087062239647
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06119028851389885
Validation loss = 0.04463677480816841
Validation loss = 0.04328704997897148
Validation loss = 0.04007429629564285
Validation loss = 0.0356164388358593
Validation loss = 0.0388944186270237
Validation loss = 0.031422894448041916
Validation loss = 0.031231505796313286
Validation loss = 0.035677388310432434
Validation loss = 0.03535165637731552
Validation loss = 0.030799292027950287
Validation loss = 0.02678762935101986
Validation loss = 0.03542415052652359
Validation loss = 0.035386983305215836
Validation loss = 0.04256678745150566
Validation loss = 0.03023625910282135
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00135  |
| Iteration     | 3         |
| MaximumReturn | -0.000912 |
| MinimumReturn | -0.00198  |
| TotalSamples  | 8330      |
-----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.039987679570913315
Validation loss = 0.024623028934001923
Validation loss = 0.022787779569625854
Validation loss = 0.02210998348891735
Validation loss = 0.02530961111187935
Validation loss = 0.021163813769817352
Validation loss = 0.024494821205735207
Validation loss = 0.020932001993060112
Validation loss = 0.021965449675917625
Validation loss = 0.02183304727077484
Validation loss = 0.018871478736400604
Validation loss = 0.021603884175419807
Validation loss = 0.021605215966701508
Validation loss = 0.019689330831170082
Validation loss = 0.02402540296316147
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03216756135225296
Validation loss = 0.03294530510902405
Validation loss = 0.02997506409883499
Validation loss = 0.02556733600795269
Validation loss = 0.025278689339756966
Validation loss = 0.03459325432777405
Validation loss = 0.02608889900147915
Validation loss = 0.025676775723695755
Validation loss = 0.021292420104146004
Validation loss = 0.02697565034031868
Validation loss = 0.022595975548028946
Validation loss = 0.035755354911088943
Validation loss = 0.02446800470352173
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.031659770756959915
Validation loss = 0.030739160254597664
Validation loss = 0.02663908153772354
Validation loss = 0.023910580202937126
Validation loss = 0.03300866857171059
Validation loss = 0.025916336104273796
Validation loss = 0.027622472494840622
Validation loss = 0.029023777693510056
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03255622833967209
Validation loss = 0.023086564615368843
Validation loss = 0.025013498961925507
Validation loss = 0.02245427295565605
Validation loss = 0.02523360773921013
Validation loss = 0.029875870794057846
Validation loss = 0.02567012794315815
Validation loss = 0.022950250655412674
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.030946610495448112
Validation loss = 0.02858831360936165
Validation loss = 0.02138480916619301
Validation loss = 0.024978667497634888
Validation loss = 0.023580949753522873
Validation loss = 0.02332550473511219
Validation loss = 0.02273826114833355
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000992 |
| Iteration     | 4         |
| MaximumReturn | -0.000639 |
| MinimumReturn | -0.0014   |
| TotalSamples  | 9996      |
-----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02698705717921257
Validation loss = 0.018266409635543823
Validation loss = 0.01750142127275467
Validation loss = 0.021424874663352966
Validation loss = 0.018140316009521484
Validation loss = 0.02050214260816574
Validation loss = 0.017530225217342377
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03719368577003479
Validation loss = 0.024858199059963226
Validation loss = 0.019792277365922928
Validation loss = 0.02063077874481678
Validation loss = 0.021363520994782448
Validation loss = 0.0199333094060421
Validation loss = 0.018000837415456772
Validation loss = 0.018737344071269035
Validation loss = 0.03966190665960312
Validation loss = 0.024249810725450516
Validation loss = 0.018542032688856125
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02688750997185707
Validation loss = 0.02275536209344864
Validation loss = 0.027146443724632263
Validation loss = 0.028622448444366455
Validation loss = 0.02087520807981491
Validation loss = 0.020859600976109505
Validation loss = 0.020509477704763412
Validation loss = 0.01870737597346306
Validation loss = 0.018286006525158882
Validation loss = 0.019551318138837814
Validation loss = 0.02001587301492691
Validation loss = 0.02261011302471161
Validation loss = 0.02207433432340622
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.029214227572083473
Validation loss = 0.03148243576288223
Validation loss = 0.02097458206117153
Validation loss = 0.02208365686237812
Validation loss = 0.01969943195581436
Validation loss = 0.017425771802663803
Validation loss = 0.030459126457571983
Validation loss = 0.02174607664346695
Validation loss = 0.02167104184627533
Validation loss = 0.019911905750632286
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02368319034576416
Validation loss = 0.020105265080928802
Validation loss = 0.02331274002790451
Validation loss = 0.02256757579743862
Validation loss = 0.02308296039700508
Validation loss = 0.022672433406114578
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00165 |
| Iteration     | 5        |
| MaximumReturn | -0.00112 |
| MinimumReturn | -0.00235 |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.027563074603676796
Validation loss = 0.023697761818766594
Validation loss = 0.019601142033934593
Validation loss = 0.019937340170145035
Validation loss = 0.025958439335227013
Validation loss = 0.028503526002168655
Validation loss = 0.021384861320257187
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021746378391981125
Validation loss = 0.01891806162893772
Validation loss = 0.02339990809559822
Validation loss = 0.031802404671907425
Validation loss = 0.022172849625349045
Validation loss = 0.02218671515583992
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.037174291908741
Validation loss = 0.02409176528453827
Validation loss = 0.02165963314473629
Validation loss = 0.021841708570718765
Validation loss = 0.024896781891584396
Validation loss = 0.020498842000961304
Validation loss = 0.022989924997091293
Validation loss = 0.026148134842514992
Validation loss = 0.0198807455599308
Validation loss = 0.020810352638363838
Validation loss = 0.020543616265058517
Validation loss = 0.02292836830019951
Validation loss = 0.02151085063815117
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.030749347060918808
Validation loss = 0.022426268085837364
Validation loss = 0.019960781559348106
Validation loss = 0.01839188113808632
Validation loss = 0.020520145073533058
Validation loss = 0.02200842648744583
Validation loss = 0.021191388368606567
Validation loss = 0.025921974331140518
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.044462598860263824
Validation loss = 0.02528047002851963
Validation loss = 0.02225370891392231
Validation loss = 0.026118719950318336
Validation loss = 0.025245586410164833
Validation loss = 0.02164359949529171
Validation loss = 0.02397231198847294
Validation loss = 0.02452986314892769
Validation loss = 0.020589694380760193
Validation loss = 0.02337312512099743
Validation loss = 0.02071600779891014
Validation loss = 0.0194571353495121
Validation loss = 0.01929289475083351
Validation loss = 0.01820574887096882
Validation loss = 0.026606133207678795
Validation loss = 0.02139001153409481
Validation loss = 0.018948424607515335
Validation loss = 0.01910487189888954
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00508 |
| Iteration     | 6        |
| MaximumReturn | -0.00353 |
| MinimumReturn | -0.00691 |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01992008648812771
Validation loss = 0.015764081850647926
Validation loss = 0.017990753054618835
Validation loss = 0.016354037448763847
Validation loss = 0.014565450139343739
Validation loss = 0.014076236635446548
Validation loss = 0.013998251408338547
Validation loss = 0.017725300043821335
Validation loss = 0.020016739144921303
Validation loss = 0.016772136092185974
Validation loss = 0.014972108416259289
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020750274881720543
Validation loss = 0.035875432193279266
Validation loss = 0.01931965909898281
Validation loss = 0.019634705036878586
Validation loss = 0.021652214229106903
Validation loss = 0.02042321301996708
Validation loss = 0.015644580125808716
Validation loss = 0.020398585125803947
Validation loss = 0.01466726791113615
Validation loss = 0.019884169101715088
Validation loss = 0.02059207670390606
Validation loss = 0.014715581201016903
Validation loss = 0.016832862049341202
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015314311720430851
Validation loss = 0.014504480175673962
Validation loss = 0.018360795453190804
Validation loss = 0.012976043857634068
Validation loss = 0.015140600502490997
Validation loss = 0.01689734123647213
Validation loss = 0.018066322430968285
Validation loss = 0.0173878725618124
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019495688378810883
Validation loss = 0.0160007756203413
Validation loss = 0.022140974178910255
Validation loss = 0.015911655500531197
Validation loss = 0.015153564512729645
Validation loss = 0.020022941753268242
Validation loss = 0.015904437750577927
Validation loss = 0.013211183249950409
Validation loss = 0.015380793251097202
Validation loss = 0.01522047072649002
Validation loss = 0.018598642200231552
Validation loss = 0.021233005449175835
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017519747838377953
Validation loss = 0.01814798265695572
Validation loss = 0.015356659889221191
Validation loss = 0.016177037730813026
Validation loss = 0.022056488320231438
Validation loss = 0.02050144597887993
Validation loss = 0.016592785716056824
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00114  |
| Iteration     | 7         |
| MaximumReturn | -0.000717 |
| MinimumReturn | -0.00204  |
| TotalSamples  | 14994     |
-----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014828121289610863
Validation loss = 0.014630376361310482
Validation loss = 0.014816024340689182
Validation loss = 0.015272892080247402
Validation loss = 0.015364580787718296
Validation loss = 0.016150999814271927
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.019721705466508865
Validation loss = 0.016191991046071053
Validation loss = 0.014059559442102909
Validation loss = 0.013982346281409264
Validation loss = 0.012717478908598423
Validation loss = 0.012495494447648525
Validation loss = 0.0152266975492239
Validation loss = 0.022868884727358818
Validation loss = 0.01629800535738468
Validation loss = 0.013524363748729229
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018670467659831047
Validation loss = 0.014827148988842964
Validation loss = 0.01839860901236534
Validation loss = 0.015073306858539581
Validation loss = 0.01309569925069809
Validation loss = 0.014054602012038231
Validation loss = 0.013288954272866249
Validation loss = 0.021692026406526566
Validation loss = 0.026956917718052864
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025118201971054077
Validation loss = 0.016249416396021843
Validation loss = 0.015710104256868362
Validation loss = 0.014352717436850071
Validation loss = 0.023358499631285667
Validation loss = 0.018593188375234604
Validation loss = 0.01507547963410616
Validation loss = 0.014971917495131493
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016444120556116104
Validation loss = 0.017587127164006233
Validation loss = 0.015375657007098198
Validation loss = 0.02135993354022503
Validation loss = 0.021529462188482285
Validation loss = 0.020159948617219925
Validation loss = 0.016075260937213898
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00119  |
| Iteration     | 8         |
| MaximumReturn | -0.000674 |
| MinimumReturn | -0.00259  |
| TotalSamples  | 16660     |
-----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014628836885094643
Validation loss = 0.015499292872846127
Validation loss = 0.013465655967593193
Validation loss = 0.013994166627526283
Validation loss = 0.017757007852196693
Validation loss = 0.012258199974894524
Validation loss = 0.012277095578610897
Validation loss = 0.011440029367804527
Validation loss = 0.011033574119210243
Validation loss = 0.017265476286411285
Validation loss = 0.016773942857980728
Validation loss = 0.014994518831372261
Validation loss = 0.013932689093053341
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016968777403235435
Validation loss = 0.017892245203256607
Validation loss = 0.01750688999891281
Validation loss = 0.018439119681715965
Validation loss = 0.018366068601608276
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015078732743859291
Validation loss = 0.01580146700143814
Validation loss = 0.015411335974931717
Validation loss = 0.017263077199459076
Validation loss = 0.01365495566278696
Validation loss = 0.014681797474622726
Validation loss = 0.014375987462699413
Validation loss = 0.013662444427609444
Validation loss = 0.012079145759344101
Validation loss = 0.017583105713129044
Validation loss = 0.013781907968223095
Validation loss = 0.012821514159440994
Validation loss = 0.012334742583334446
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01387573592364788
Validation loss = 0.015476231463253498
Validation loss = 0.0131660345941782
Validation loss = 0.017230400815606117
Validation loss = 0.014989005401730537
Validation loss = 0.012413853779435158
Validation loss = 0.015966124832630157
Validation loss = 0.01176627166569233
Validation loss = 0.013280377723276615
Validation loss = 0.01612795889377594
Validation loss = 0.017159486189484596
Validation loss = 0.019199080765247345
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014474142342805862
Validation loss = 0.018152667209506035
Validation loss = 0.014272852800786495
Validation loss = 0.014978669583797455
Validation loss = 0.017373383045196533
Validation loss = 0.013046453706920147
Validation loss = 0.01713501289486885
Validation loss = 0.018637290224432945
Validation loss = 0.01686173863708973
Validation loss = 0.01737406849861145
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000833 |
| Iteration     | 9         |
| MaximumReturn | -0.000641 |
| MinimumReturn | -0.00114  |
| TotalSamples  | 18326     |
-----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011115667410194874
Validation loss = 0.012240851297974586
Validation loss = 0.016162285581231117
Validation loss = 0.01574859954416752
Validation loss = 0.023194899782538414
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01574825495481491
Validation loss = 0.012162481434643269
Validation loss = 0.013165019452571869
Validation loss = 0.012104795314371586
Validation loss = 0.013571579940617085
Validation loss = 0.010685677640140057
Validation loss = 0.011306587606668472
Validation loss = 0.015010591596364975
Validation loss = 0.017583519220352173
Validation loss = 0.016238011419773102
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013029862195253372
Validation loss = 0.01602071151137352
Validation loss = 0.016253968700766563
Validation loss = 0.022004133090376854
Validation loss = 0.012860535643994808
Validation loss = 0.01270374283194542
Validation loss = 0.01516707893460989
Validation loss = 0.01451369933784008
Validation loss = 0.016055144369602203
Validation loss = 0.016731804236769676
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022207889705896378
Validation loss = 0.014287961646914482
Validation loss = 0.019999487325549126
Validation loss = 0.017096158117055893
Validation loss = 0.018059201538562775
Validation loss = 0.012999560683965683
Validation loss = 0.02247045189142227
Validation loss = 0.017040761187672615
Validation loss = 0.014910821802914143
Validation loss = 0.014655283652245998
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017437361180782318
Validation loss = 0.012958195060491562
Validation loss = 0.014920837245881557
Validation loss = 0.01839866116642952
Validation loss = 0.016721609979867935
Validation loss = 0.015892649069428444
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00154 |
| Iteration     | 10       |
| MaximumReturn | -0.001   |
| MinimumReturn | -0.0026  |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020279644057154655
Validation loss = 0.01449788361787796
Validation loss = 0.014789186418056488
Validation loss = 0.015494768507778645
Validation loss = 0.013002646155655384
Validation loss = 0.011791868135333061
Validation loss = 0.010502757504582405
Validation loss = 0.011577891185879707
Validation loss = 0.010926758870482445
Validation loss = 0.016942469403147697
Validation loss = 0.010810794308781624
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01721813902258873
Validation loss = 0.01759805716574192
Validation loss = 0.016444824635982513
Validation loss = 0.0154896704480052
Validation loss = 0.013864669017493725
Validation loss = 0.011869487352669239
Validation loss = 0.012236080132424831
Validation loss = 0.013373109512031078
Validation loss = 0.016809122636914253
Validation loss = 0.01430460624396801
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020871935412287712
Validation loss = 0.014231057837605476
Validation loss = 0.012099510058760643
Validation loss = 0.012680530548095703
Validation loss = 0.013549688272178173
Validation loss = 0.011987159959971905
Validation loss = 0.014780390076339245
Validation loss = 0.012866703793406487
Validation loss = 0.011587971821427345
Validation loss = 0.011339832097291946
Validation loss = 0.009644116275012493
Validation loss = 0.011589378118515015
Validation loss = 0.010886898264288902
Validation loss = 0.012258678674697876
Validation loss = 0.012326685711741447
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012155466713011265
Validation loss = 0.014680838212370872
Validation loss = 0.015823472291231155
Validation loss = 0.015487877652049065
Validation loss = 0.019792580977082253
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01375791896134615
Validation loss = 0.014692108146846294
Validation loss = 0.01659408211708069
Validation loss = 0.01097602304071188
Validation loss = 0.012522146105766296
Validation loss = 0.015647219493985176
Validation loss = 0.015025530941784382
Validation loss = 0.011790579184889793
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00133  |
| Iteration     | 11        |
| MaximumReturn | -0.000835 |
| MinimumReturn | -0.00213  |
| TotalSamples  | 21658     |
-----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014085454866290092
Validation loss = 0.015706690028309822
Validation loss = 0.013981853611767292
Validation loss = 0.01348944567143917
Validation loss = 0.014265438541769981
Validation loss = 0.019970860332250595
Validation loss = 0.015019448474049568
Validation loss = 0.011395500041544437
Validation loss = 0.010482210665941238
Validation loss = 0.012123323045670986
Validation loss = 0.013777563348412514
Validation loss = 0.01240510679781437
Validation loss = 0.015241874381899834
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016887899488210678
Validation loss = 0.015371079556643963
Validation loss = 0.013932284899055958
Validation loss = 0.016640018671751022
Validation loss = 0.013569921255111694
Validation loss = 0.011596733704209328
Validation loss = 0.017110249027609825
Validation loss = 0.012634234502911568
Validation loss = 0.011796833015978336
Validation loss = 0.012735839001834393
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013202449306845665
Validation loss = 0.01276564784348011
Validation loss = 0.012211916968226433
Validation loss = 0.011506861075758934
Validation loss = 0.014895414002239704
Validation loss = 0.01783148944377899
Validation loss = 0.011327102780342102
Validation loss = 0.012168650515377522
Validation loss = 0.014674976468086243
Validation loss = 0.016834530979394913
Validation loss = 0.012002591975033283
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017291514202952385
Validation loss = 0.01323370449244976
Validation loss = 0.014693109318614006
Validation loss = 0.014630287885665894
Validation loss = 0.014149993658065796
Validation loss = 0.01589593105018139
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012129675596952438
Validation loss = 0.015234969556331635
Validation loss = 0.01675662025809288
Validation loss = 0.013453875668346882
Validation loss = 0.015079163014888763
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000811 |
| Iteration     | 12        |
| MaximumReturn | -0.000563 |
| MinimumReturn | -0.00108  |
| TotalSamples  | 23324     |
-----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01692310720682144
Validation loss = 0.01190072949975729
Validation loss = 0.021345991641283035
Validation loss = 0.017644116654992104
Validation loss = 0.012555718421936035
Validation loss = 0.01391626801341772
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011656475253403187
Validation loss = 0.019060513004660606
Validation loss = 0.012492331676185131
Validation loss = 0.013016194105148315
Validation loss = 0.01698995754122734
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012491955421864986
Validation loss = 0.014024480246007442
Validation loss = 0.013956349343061447
Validation loss = 0.00967490952461958
Validation loss = 0.012404478155076504
Validation loss = 0.012803700752556324
Validation loss = 0.011672905646264553
Validation loss = 0.012111660093069077
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014775242656469345
Validation loss = 0.016829797998070717
Validation loss = 0.011990398168563843
Validation loss = 0.011822222732007504
Validation loss = 0.014271042309701443
Validation loss = 0.013311387039721012
Validation loss = 0.012945991940796375
Validation loss = 0.013496657833456993
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020009223371744156
Validation loss = 0.0141574926674366
Validation loss = 0.01617124117910862
Validation loss = 0.011963405646383762
Validation loss = 0.011085983365774155
Validation loss = 0.014856213703751564
Validation loss = 0.010840903967618942
Validation loss = 0.01118709146976471
Validation loss = 0.01102445088326931
Validation loss = 0.015733717009425163
Validation loss = 0.01751566492021084
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000822 |
| Iteration     | 13        |
| MaximumReturn | -0.000469 |
| MinimumReturn | -0.00126  |
| TotalSamples  | 24990     |
-----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01327664777636528
Validation loss = 0.014117366634309292
Validation loss = 0.01401948556303978
Validation loss = 0.016386834904551506
Validation loss = 0.011972739361226559
Validation loss = 0.011658783070743084
Validation loss = 0.011926359497010708
Validation loss = 0.011107973754405975
Validation loss = 0.009676055051386356
Validation loss = 0.011107604950666428
Validation loss = 0.022237485274672508
Validation loss = 0.014612797647714615
Validation loss = 0.013984243385493755
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014320876449346542
Validation loss = 0.017536098137497902
Validation loss = 0.01683085784316063
Validation loss = 0.015949582681059837
Validation loss = 0.014109100215137005
Validation loss = 0.017998265102505684
Validation loss = 0.012402839958667755
Validation loss = 0.012470287270843983
Validation loss = 0.012063778005540371
Validation loss = 0.01131463423371315
Validation loss = 0.012292386032640934
Validation loss = 0.013512271456420422
Validation loss = 0.012976541183888912
Validation loss = 0.012958458624780178
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01163217332214117
Validation loss = 0.01495687011629343
Validation loss = 0.011328395456075668
Validation loss = 0.010392935015261173
Validation loss = 0.010181370191276073
Validation loss = 0.009900837205350399
Validation loss = 0.011052929796278477
Validation loss = 0.017822710797190666
Validation loss = 0.011722780764102936
Validation loss = 0.01302510779350996
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011424187570810318
Validation loss = 0.014401677995920181
Validation loss = 0.013595682568848133
Validation loss = 0.016696704551577568
Validation loss = 0.013670532964169979
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017215542495250702
Validation loss = 0.017089303582906723
Validation loss = 0.013718515634536743
Validation loss = 0.012170299887657166
Validation loss = 0.017499638721346855
Validation loss = 0.013241427950561047
Validation loss = 0.015220441855490208
Validation loss = 0.0118867801502347
Validation loss = 0.01330652367323637
Validation loss = 0.01091242115944624
Validation loss = 0.011776513420045376
Validation loss = 0.011289714835584164
Validation loss = 0.010849197395145893
Validation loss = 0.012455839663743973
Validation loss = 0.01299250591546297
Validation loss = 0.0109703429043293
Validation loss = 0.010841219685971737
Validation loss = 0.011599383316934109
Validation loss = 0.013963009230792522
Validation loss = 0.01268486212939024
Validation loss = 0.01274300366640091
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000812 |
| Iteration     | 14        |
| MaximumReturn | -0.000505 |
| MinimumReturn | -0.00113  |
| TotalSamples  | 26656     |
-----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014015616849064827
Validation loss = 0.012422621250152588
Validation loss = 0.01103445515036583
Validation loss = 0.013246467337012291
Validation loss = 0.018438924103975296
Validation loss = 0.011946514248847961
Validation loss = 0.012371773831546307
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013274936936795712
Validation loss = 0.013282782398164272
Validation loss = 0.012295251712203026
Validation loss = 0.017319364473223686
Validation loss = 0.014527445659041405
Validation loss = 0.014933811500668526
Validation loss = 0.017763616517186165
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012256125919520855
Validation loss = 0.01617695763707161
Validation loss = 0.010065598413348198
Validation loss = 0.01331518404185772
Validation loss = 0.01405203714966774
Validation loss = 0.010863179340958595
Validation loss = 0.012741317972540855
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012582574039697647
Validation loss = 0.014250203035771847
Validation loss = 0.016665341332554817
Validation loss = 0.019136328250169754
Validation loss = 0.012804171070456505
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014488800428807735
Validation loss = 0.012991347350180149
Validation loss = 0.014977569691836834
Validation loss = 0.013739640824496746
Validation loss = 0.015166198834776878
Validation loss = 0.011826799251139164
Validation loss = 0.012391663156449795
Validation loss = 0.012112491764128208
Validation loss = 0.012854321859776974
Validation loss = 0.009954733774065971
Validation loss = 0.010852089151740074
Validation loss = 0.012732385657727718
Validation loss = 0.011882131919264793
Validation loss = 0.016086362302303314
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000853 |
| Iteration     | 15        |
| MaximumReturn | -0.000596 |
| MinimumReturn | -0.00113  |
| TotalSamples  | 28322     |
-----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017201388254761696
Validation loss = 0.012456907890737057
Validation loss = 0.012612809427082539
Validation loss = 0.014641265384852886
Validation loss = 0.013546674512326717
Validation loss = 0.012111343443393707
Validation loss = 0.010994059033691883
Validation loss = 0.013426654040813446
Validation loss = 0.012499796226620674
Validation loss = 0.010296917520463467
Validation loss = 0.011069430969655514
Validation loss = 0.010525519028306007
Validation loss = 0.014157597906887531
Validation loss = 0.013981638476252556
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02002805657684803
Validation loss = 0.013799535110592842
Validation loss = 0.013779671862721443
Validation loss = 0.01563255861401558
Validation loss = 0.016051845625042915
Validation loss = 0.01457278709858656
Validation loss = 0.014034577645361423
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0199379064142704
Validation loss = 0.011483952403068542
Validation loss = 0.01367570273578167
Validation loss = 0.00931527279317379
Validation loss = 0.010626500472426414
Validation loss = 0.010624215938150883
Validation loss = 0.009337692521512508
Validation loss = 0.010704608634114265
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011009185574948788
Validation loss = 0.012518428266048431
Validation loss = 0.011808947660028934
Validation loss = 0.01226587314158678
Validation loss = 0.011100813746452332
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017491960898041725
Validation loss = 0.013840842060744762
Validation loss = 0.012875069864094257
Validation loss = 0.01206711120903492
Validation loss = 0.01219802163541317
Validation loss = 0.013215051963925362
Validation loss = 0.012494847178459167
Validation loss = 0.010356716811656952
Validation loss = 0.011298462748527527
Validation loss = 0.01087206695228815
Validation loss = 0.010853623040020466
Validation loss = 0.011069267988204956
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000772 |
| Iteration     | 16        |
| MaximumReturn | -0.000585 |
| MinimumReturn | -0.0012   |
| TotalSamples  | 29988     |
-----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014588644728064537
Validation loss = 0.011206473223865032
Validation loss = 0.010881984606385231
Validation loss = 0.009672809392213821
Validation loss = 0.013976669870316982
Validation loss = 0.014216432347893715
Validation loss = 0.014755010604858398
Validation loss = 0.01118695642799139
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011189814656972885
Validation loss = 0.014146450906991959
Validation loss = 0.011795121245086193
Validation loss = 0.010067637078464031
Validation loss = 0.01201237179338932
Validation loss = 0.01320186909288168
Validation loss = 0.015420857816934586
Validation loss = 0.01351708173751831
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016824543476104736
Validation loss = 0.018162021413445473
Validation loss = 0.01234606746584177
Validation loss = 0.016734205186367035
Validation loss = 0.013942989520728588
Validation loss = 0.0125862006098032
Validation loss = 0.010872012004256248
Validation loss = 0.008898153901100159
Validation loss = 0.01081411074846983
Validation loss = 0.009428536519408226
Validation loss = 0.009153849445283413
Validation loss = 0.01149075012654066
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01574777252972126
Validation loss = 0.016572559252381325
Validation loss = 0.011833268217742443
Validation loss = 0.01378081925213337
Validation loss = 0.013495072722434998
Validation loss = 0.01920490525662899
Validation loss = 0.012524341233074665
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01746710017323494
Validation loss = 0.013697857037186623
Validation loss = 0.015685638412833214
Validation loss = 0.015516366809606552
Validation loss = 0.012069526128470898
Validation loss = 0.012834146618843079
Validation loss = 0.011297360062599182
Validation loss = 0.010916706174612045
Validation loss = 0.0111938351765275
Validation loss = 0.011610105633735657
Validation loss = 0.010855374857783318
Validation loss = 0.013662793673574924
Validation loss = 0.010151327587664127
Validation loss = 0.010565943084657192
Validation loss = 0.014398057945072651
Validation loss = 0.011430812999606133
Validation loss = 0.01050748210400343
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000838 |
| Iteration     | 17        |
| MaximumReturn | -0.000624 |
| MinimumReturn | -0.00134  |
| TotalSamples  | 31654     |
-----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014625073410570621
Validation loss = 0.012323279865086079
Validation loss = 0.013775642029941082
Validation loss = 0.01359277032315731
Validation loss = 0.018255800008773804
Validation loss = 0.014689946547150612
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014474708586931229
Validation loss = 0.015081883408129215
Validation loss = 0.015607431530952454
Validation loss = 0.01320499088615179
Validation loss = 0.016823749989271164
Validation loss = 0.017391042783856392
Validation loss = 0.017494870349764824
Validation loss = 0.015453115105628967
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011158705689013004
Validation loss = 0.017728421837091446
Validation loss = 0.01815926469862461
Validation loss = 0.017435966059565544
Validation loss = 0.011859704740345478
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012392015196383
Validation loss = 0.010752031579613686
Validation loss = 0.012802033685147762
Validation loss = 0.012358379550278187
Validation loss = 0.015672307461500168
Validation loss = 0.011540706269443035
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011525393463671207
Validation loss = 0.012920104898512363
Validation loss = 0.01680777408182621
Validation loss = 0.01628822460770607
Validation loss = 0.012421468272805214
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000833 |
| Iteration     | 18        |
| MaximumReturn | -0.000535 |
| MinimumReturn | -0.00129  |
| TotalSamples  | 33320     |
-----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016662487760186195
Validation loss = 0.011994507163763046
Validation loss = 0.014006621204316616
Validation loss = 0.012599623762071133
Validation loss = 0.009413306601345539
Validation loss = 0.011279508471488953
Validation loss = 0.010599612258374691
Validation loss = 0.011704156175255775
Validation loss = 0.019189035519957542
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01494669821113348
Validation loss = 0.01365636195987463
Validation loss = 0.012305504642426968
Validation loss = 0.012832773849368095
Validation loss = 0.01233440451323986
Validation loss = 0.01302839070558548
Validation loss = 0.012023070827126503
Validation loss = 0.018117472529411316
Validation loss = 0.014322425238788128
Validation loss = 0.015750354155898094
Validation loss = 0.0158966314047575
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013562901876866817
Validation loss = 0.014580337330698967
Validation loss = 0.01356237381696701
Validation loss = 0.015017475001513958
Validation loss = 0.011874856427311897
Validation loss = 0.011552233248949051
Validation loss = 0.012719789519906044
Validation loss = 0.012650989927351475
Validation loss = 0.010859408415853977
Validation loss = 0.011429321020841599
Validation loss = 0.015523592941462994
Validation loss = 0.022795282304286957
Validation loss = 0.013117357157170773
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013815375976264477
Validation loss = 0.019765641540288925
Validation loss = 0.013023415580391884
Validation loss = 0.014476644806563854
Validation loss = 0.018174249678850174
Validation loss = 0.013372266665101051
Validation loss = 0.013264281675219536
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012934808619320393
Validation loss = 0.012907522730529308
Validation loss = 0.013205715455114841
Validation loss = 0.012006337754428387
Validation loss = 0.012318659573793411
Validation loss = 0.015153526328504086
Validation loss = 0.010406158864498138
Validation loss = 0.011778845451772213
Validation loss = 0.010020003654062748
Validation loss = 0.01621672511100769
Validation loss = 0.013641648925840855
Validation loss = 0.010961503721773624
Validation loss = 0.00991133600473404
Validation loss = 0.01019557286053896
Validation loss = 0.010982644744217396
Validation loss = 0.010016107931733131
Validation loss = 0.009530897252261639
Validation loss = 0.011699159629642963
Validation loss = 0.01433784794062376
Validation loss = 0.011328356340527534
Validation loss = 0.010606850497424603
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000814 |
| Iteration     | 19        |
| MaximumReturn | -0.000613 |
| MinimumReturn | -0.00112  |
| TotalSamples  | 34986     |
-----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.019633395597338676
Validation loss = 0.012298540212213993
Validation loss = 0.012161172926425934
Validation loss = 0.011126196943223476
Validation loss = 0.01034481544047594
Validation loss = 0.010119973681867123
Validation loss = 0.013642128556966782
Validation loss = 0.014931116253137589
Validation loss = 0.013409602455794811
Validation loss = 0.010795580223202705
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012407180853188038
Validation loss = 0.011078815907239914
Validation loss = 0.014647304080426693
Validation loss = 0.012682721950113773
Validation loss = 0.010960408486425877
Validation loss = 0.010713394731283188
Validation loss = 0.013678601942956448
Validation loss = 0.011756026186048985
Validation loss = 0.01614193618297577
Validation loss = 0.01288085151463747
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013483458198606968
Validation loss = 0.01801251247525215
Validation loss = 0.01393925305455923
Validation loss = 0.010863480158150196
Validation loss = 0.010897325351834297
Validation loss = 0.012182928621768951
Validation loss = 0.012407270260155201
Validation loss = 0.012649201788008213
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014121221378445625
Validation loss = 0.015583802945911884
Validation loss = 0.013061597011983395
Validation loss = 0.012372408993542194
Validation loss = 0.011280657723546028
Validation loss = 0.014119083993136883
Validation loss = 0.012158926576375961
Validation loss = 0.01251993142068386
Validation loss = 0.014258437789976597
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01017737202346325
Validation loss = 0.01788236014544964
Validation loss = 0.010414164513349533
Validation loss = 0.010659253224730492
Validation loss = 0.010154093615710735
Validation loss = 0.013919149525463581
Validation loss = 0.009985331445932388
Validation loss = 0.012347254902124405
Validation loss = 0.011596028693020344
Validation loss = 0.010067882016301155
Validation loss = 0.009675403125584126
Validation loss = 0.013782146386802197
Validation loss = 0.010939226485788822
Validation loss = 0.010968286544084549
Validation loss = 0.008984974585473537
Validation loss = 0.009415507316589355
Validation loss = 0.011380571871995926
Validation loss = 0.013057260774075985
Validation loss = 0.010282997041940689
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000888 |
| Iteration     | 20        |
| MaximumReturn | -0.000635 |
| MinimumReturn | -0.00182  |
| TotalSamples  | 36652     |
-----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009949881583452225
Validation loss = 0.009447568096220493
Validation loss = 0.010038431733846664
Validation loss = 0.011862138286232948
Validation loss = 0.010231423191726208
Validation loss = 0.010070065036416054
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010960133746266365
Validation loss = 0.01118030771613121
Validation loss = 0.012302270159125328
Validation loss = 0.011353198438882828
Validation loss = 0.011710402555763721
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011838709004223347
Validation loss = 0.012434498406946659
Validation loss = 0.01487677451223135
Validation loss = 0.014133001677691936
Validation loss = 0.011561473831534386
Validation loss = 0.012225735932588577
Validation loss = 0.01216200552880764
Validation loss = 0.010986139997839928
Validation loss = 0.010616470128297806
Validation loss = 0.01169571466743946
Validation loss = 0.012100635096430779
Validation loss = 0.011972496286034584
Validation loss = 0.014955537393689156
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016036324203014374
Validation loss = 0.011894765309989452
Validation loss = 0.01255591306835413
Validation loss = 0.013059948571026325
Validation loss = 0.01576846092939377
Validation loss = 0.01554169226437807
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01213074754923582
Validation loss = 0.009686394594609737
Validation loss = 0.012413893826305866
Validation loss = 0.012964990921318531
Validation loss = 0.011732109822332859
Validation loss = 0.014325855299830437
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000807 |
| Iteration     | 21        |
| MaximumReturn | -0.000498 |
| MinimumReturn | -0.0018   |
| TotalSamples  | 38318     |
-----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015726281329989433
Validation loss = 0.018513264134526253
Validation loss = 0.014115246012806892
Validation loss = 0.011177930980920792
Validation loss = 0.010978619568049908
Validation loss = 0.011865138076245785
Validation loss = 0.009665102697908878
Validation loss = 0.013102608732879162
Validation loss = 0.01016940176486969
Validation loss = 0.009998663328588009
Validation loss = 0.010038902051746845
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013699708506464958
Validation loss = 0.01150380726903677
Validation loss = 0.013100563548505306
Validation loss = 0.014153365977108479
Validation loss = 0.012694609351456165
Validation loss = 0.011462404392659664
Validation loss = 0.011371245607733727
Validation loss = 0.010831884108483791
Validation loss = 0.014009690843522549
Validation loss = 0.015070619992911816
Validation loss = 0.012836943380534649
Validation loss = 0.01141788437962532
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014776598662137985
Validation loss = 0.012338276952505112
Validation loss = 0.012348020449280739
Validation loss = 0.012380060739815235
Validation loss = 0.010995102114975452
Validation loss = 0.015664462000131607
Validation loss = 0.014307340607047081
Validation loss = 0.013644433580338955
Validation loss = 0.011511390097439289
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016063326969742775
Validation loss = 0.012424609623849392
Validation loss = 0.010665610432624817
Validation loss = 0.017629312351346016
Validation loss = 0.011953866109251976
Validation loss = 0.013354468159377575
Validation loss = 0.010299275629222393
Validation loss = 0.01137900073081255
Validation loss = 0.010498123243451118
Validation loss = 0.018367575481534004
Validation loss = 0.017682814970612526
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013437459245324135
Validation loss = 0.011260922998189926
Validation loss = 0.012203413061797619
Validation loss = 0.01035868376493454
Validation loss = 0.009918121621012688
Validation loss = 0.009852361865341663
Validation loss = 0.014497715048491955
Validation loss = 0.010192817077040672
Validation loss = 0.009419578127563
Validation loss = 0.009603690356016159
Validation loss = 0.013590139336884022
Validation loss = 0.009738782420754433
Validation loss = 0.010024437680840492
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000797 |
| Iteration     | 22        |
| MaximumReturn | -0.00055  |
| MinimumReturn | -0.001    |
| TotalSamples  | 39984     |
-----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009287928231060505
Validation loss = 0.011884133331477642
Validation loss = 0.009323807433247566
Validation loss = 0.009422725066542625
Validation loss = 0.01008719764649868
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01156899519264698
Validation loss = 0.011325073428452015
Validation loss = 0.011512971483170986
Validation loss = 0.014114230871200562
Validation loss = 0.01023974921554327
Validation loss = 0.009618057869374752
Validation loss = 0.010500969365239143
Validation loss = 0.011032218113541603
Validation loss = 0.016995131969451904
Validation loss = 0.012194289825856686
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012823328375816345
Validation loss = 0.011803669854998589
Validation loss = 0.010956184938549995
Validation loss = 0.013148029334843159
Validation loss = 0.014749722555279732
Validation loss = 0.011914804577827454
Validation loss = 0.012375295162200928
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016576284542679787
Validation loss = 0.011838260106742382
Validation loss = 0.013021433725953102
Validation loss = 0.009684004820883274
Validation loss = 0.014250176958739758
Validation loss = 0.012096461839973927
Validation loss = 0.010662047192454338
Validation loss = 0.011920491233468056
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011657489463686943
Validation loss = 0.011540481820702553
Validation loss = 0.010510963387787342
Validation loss = 0.010808746330440044
Validation loss = 0.010297693312168121
Validation loss = 0.011412488296627998
Validation loss = 0.011971775442361832
Validation loss = 0.009602280333638191
Validation loss = 0.009063595905900002
Validation loss = 0.009988328441977501
Validation loss = 0.01043763943016529
Validation loss = 0.008709379471838474
Validation loss = 0.008709138259291649
Validation loss = 0.015977928414940834
Validation loss = 0.009306607767939568
Validation loss = 0.009576334618031979
Validation loss = 0.008868744596838951
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000804 |
| Iteration     | 23        |
| MaximumReturn | -0.000602 |
| MinimumReturn | -0.00102  |
| TotalSamples  | 41650     |
-----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01028423197567463
Validation loss = 0.01122617069631815
Validation loss = 0.015422100201249123
Validation loss = 0.014330079779028893
Validation loss = 0.009556038305163383
Validation loss = 0.013937518000602722
Validation loss = 0.013344815000891685
Validation loss = 0.010658991523087025
Validation loss = 0.010167362168431282
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01385509967803955
Validation loss = 0.014250566251575947
Validation loss = 0.012418573722243309
Validation loss = 0.010572517290711403
Validation loss = 0.009802652522921562
Validation loss = 0.010835984721779823
Validation loss = 0.010666941292583942
Validation loss = 0.01196819543838501
Validation loss = 0.014428664930164814
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010699356906116009
Validation loss = 0.01280544139444828
Validation loss = 0.012589046731591225
Validation loss = 0.011125287972390652
Validation loss = 0.02000018209218979
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011454898864030838
Validation loss = 0.015607617795467377
Validation loss = 0.009860316291451454
Validation loss = 0.011372162029147148
Validation loss = 0.011693605221807957
Validation loss = 0.019282016903162003
Validation loss = 0.01625138707458973
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009179001674056053
Validation loss = 0.01001418475061655
Validation loss = 0.017234213650226593
Validation loss = 0.016627049073576927
Validation loss = 0.012112731114029884
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000841 |
| Iteration     | 24        |
| MaximumReturn | -0.000546 |
| MinimumReturn | -0.00162  |
| TotalSamples  | 43316     |
-----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009331240318715572
Validation loss = 0.008918710984289646
Validation loss = 0.010170779190957546
Validation loss = 0.012576970271766186
Validation loss = 0.010469062253832817
Validation loss = 0.009994511492550373
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014458169229328632
Validation loss = 0.010390187613666058
Validation loss = 0.01598328910768032
Validation loss = 0.0123039111495018
Validation loss = 0.016690218821167946
Validation loss = 0.010626590810716152
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014689509756863117
Validation loss = 0.012747188098728657
Validation loss = 0.016250822693109512
Validation loss = 0.01075162272900343
Validation loss = 0.011578638106584549
Validation loss = 0.011630336754024029
Validation loss = 0.011466778814792633
Validation loss = 0.0102092819288373
Validation loss = 0.010887944139540195
Validation loss = 0.014505881816148758
Validation loss = 0.01255832239985466
Validation loss = 0.012805106118321419
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012825598008930683
Validation loss = 0.01889168657362461
Validation loss = 0.01494153868407011
Validation loss = 0.0137360580265522
Validation loss = 0.013878541067242622
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012479124590754509
Validation loss = 0.012330947443842888
Validation loss = 0.012318327091634274
Validation loss = 0.011460455134510994
Validation loss = 0.011798767372965813
Validation loss = 0.009937143884599209
Validation loss = 0.009197668172419071
Validation loss = 0.009391441009938717
Validation loss = 0.01234591007232666
Validation loss = 0.009720919653773308
Validation loss = 0.009497204795479774
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000826 |
| Iteration     | 25        |
| MaximumReturn | -0.000675 |
| MinimumReturn | -0.00105  |
| TotalSamples  | 44982     |
-----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008136770687997341
Validation loss = 0.014341378584504128
Validation loss = 0.010848660953342915
Validation loss = 0.009187781251966953
Validation loss = 0.01019897311925888
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011598408222198486
Validation loss = 0.014014413580298424
Validation loss = 0.010124793276190758
Validation loss = 0.01125967875123024
Validation loss = 0.012245400808751583
Validation loss = 0.011530030518770218
Validation loss = 0.01072931382805109
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015423726290464401
Validation loss = 0.013724258169531822
Validation loss = 0.012971347197890282
Validation loss = 0.010803557001054287
Validation loss = 0.012058570981025696
Validation loss = 0.015722522512078285
Validation loss = 0.017321644350886345
Validation loss = 0.011781987734138966
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011953204870223999
Validation loss = 0.012057533487677574
Validation loss = 0.011056724935770035
Validation loss = 0.01351942215114832
Validation loss = 0.012264982797205448
Validation loss = 0.010220213793218136
Validation loss = 0.011654134839773178
Validation loss = 0.01728878542780876
Validation loss = 0.01722618192434311
Validation loss = 0.014687002636492252
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010073595680296421
Validation loss = 0.01695959083735943
Validation loss = 0.010434060357511044
Validation loss = 0.01217101514339447
Validation loss = 0.0086900619789958
Validation loss = 0.009700130671262741
Validation loss = 0.010048282332718372
Validation loss = 0.010051988996565342
Validation loss = 0.01435257401317358
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000776 |
| Iteration     | 26        |
| MaximumReturn | -0.00051  |
| MinimumReturn | -0.00103  |
| TotalSamples  | 46648     |
-----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009679851122200489
Validation loss = 0.014900668524205685
Validation loss = 0.013256493955850601
Validation loss = 0.010258477181196213
Validation loss = 0.013890744186937809
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010362132452428341
Validation loss = 0.01018762681633234
Validation loss = 0.010319082997739315
Validation loss = 0.012179043143987656
Validation loss = 0.010455341078341007
Validation loss = 0.015297496691346169
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012083983980119228
Validation loss = 0.012231511063873768
Validation loss = 0.01224435679614544
Validation loss = 0.010803605429828167
Validation loss = 0.012057704851031303
Validation loss = 0.011744647286832333
Validation loss = 0.016404036432504654
Validation loss = 0.01078938227146864
Validation loss = 0.01135899219661951
Validation loss = 0.011814269237220287
Validation loss = 0.01181164663285017
Validation loss = 0.011234961450099945
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020630981773138046
Validation loss = 0.013958418741822243
Validation loss = 0.011516793631017208
Validation loss = 0.010399122722446918
Validation loss = 0.013216872699558735
Validation loss = 0.013461096212267876
Validation loss = 0.013972085900604725
Validation loss = 0.012040329165756702
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01296813040971756
Validation loss = 0.009995943866670132
Validation loss = 0.009383222088217735
Validation loss = 0.00994382705539465
Validation loss = 0.008228350430727005
Validation loss = 0.010671733878552914
Validation loss = 0.012271314859390259
Validation loss = 0.01114671677350998
Validation loss = 0.011702444404363632
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000853 |
| Iteration     | 27        |
| MaximumReturn | -0.000519 |
| MinimumReturn | -0.00172  |
| TotalSamples  | 48314     |
-----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009654123336076736
Validation loss = 0.009094730950891972
Validation loss = 0.011868834495544434
Validation loss = 0.00884290598332882
Validation loss = 0.010949880816042423
Validation loss = 0.012245476245880127
Validation loss = 0.010639070533216
Validation loss = 0.011261976324021816
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013533848337829113
Validation loss = 0.010349844582378864
Validation loss = 0.009352502413094044
Validation loss = 0.013126567006111145
Validation loss = 0.009432495571672916
Validation loss = 0.012279212474822998
Validation loss = 0.010784782469272614
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011857770383358002
Validation loss = 0.01367945596575737
Validation loss = 0.012962207198143005
Validation loss = 0.011331123299896717
Validation loss = 0.011924316175282001
Validation loss = 0.009443349204957485
Validation loss = 0.011359412223100662
Validation loss = 0.01416321936994791
Validation loss = 0.01236825343221426
Validation loss = 0.011226127855479717
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016988353803753853
Validation loss = 0.010697546415030956
Validation loss = 0.012152454815804958
Validation loss = 0.009877920150756836
Validation loss = 0.009755447506904602
Validation loss = 0.010923276655375957
Validation loss = 0.009635013528168201
Validation loss = 0.011247586458921432
Validation loss = 0.01550073828548193
Validation loss = 0.013260145671665668
Validation loss = 0.010798255912959576
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011556378565728664
Validation loss = 0.009922203607857227
Validation loss = 0.010351990349590778
Validation loss = 0.011401738971471786
Validation loss = 0.010225663892924786
Validation loss = 0.009530243463814259
Validation loss = 0.009140283800661564
Validation loss = 0.009205569513142109
Validation loss = 0.010857484303414822
Validation loss = 0.012418870814144611
Validation loss = 0.009057519026100636
Validation loss = 0.014588776975870132
Validation loss = 0.009059018455445766
Validation loss = 0.010527379810810089
Validation loss = 0.01033313013613224
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000787 |
| Iteration     | 28        |
| MaximumReturn | -0.000552 |
| MinimumReturn | -0.000995 |
| TotalSamples  | 49980     |
-----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011592000722885132
Validation loss = 0.015704447403550148
Validation loss = 0.008516103960573673
Validation loss = 0.009868616238236427
Validation loss = 0.009525801986455917
Validation loss = 0.015757368877530098
Validation loss = 0.008156043477356434
Validation loss = 0.01281723752617836
Validation loss = 0.009263797663152218
Validation loss = 0.008432851172983646
Validation loss = 0.008567122742533684
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011722346767783165
Validation loss = 0.010988045483827591
Validation loss = 0.009935282170772552
Validation loss = 0.02083813212811947
Validation loss = 0.009942472912371159
Validation loss = 0.014431391842663288
Validation loss = 0.012314689345657825
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011198259890079498
Validation loss = 0.014778656885027885
Validation loss = 0.010876626707613468
Validation loss = 0.014993663877248764
Validation loss = 0.01079945545643568
Validation loss = 0.011517192237079144
Validation loss = 0.011216354556381702
Validation loss = 0.009725410491228104
Validation loss = 0.01173966284841299
Validation loss = 0.011938142590224743
Validation loss = 0.011355038732290268
Validation loss = 0.010742469690740108
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013261140324175358
Validation loss = 0.010158175602555275
Validation loss = 0.010046537034213543
Validation loss = 0.010051318444311619
Validation loss = 0.011029953137040138
Validation loss = 0.01627899706363678
Validation loss = 0.00974595919251442
Validation loss = 0.011599446646869183
Validation loss = 0.011927410028874874
Validation loss = 0.011154362000524998
Validation loss = 0.009418639354407787
Validation loss = 0.009325660765171051
Validation loss = 0.01506788283586502
Validation loss = 0.0114598348736763
Validation loss = 0.010470693930983543
Validation loss = 0.016390040516853333
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008952651172876358
Validation loss = 0.010446491651237011
Validation loss = 0.008406711742281914
Validation loss = 0.013541889376938343
Validation loss = 0.011776413768529892
Validation loss = 0.01251276209950447
Validation loss = 0.009668046608567238
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000824 |
| Iteration     | 29        |
| MaximumReturn | -0.000625 |
| MinimumReturn | -0.00136  |
| TotalSamples  | 51646     |
-----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008935506455600262
Validation loss = 0.010125751607120037
Validation loss = 0.008654730394482613
Validation loss = 0.011662551201879978
Validation loss = 0.009502843953669071
Validation loss = 0.009209892712533474
Validation loss = 0.010572714731097221
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013931948691606522
Validation loss = 0.010904626920819283
Validation loss = 0.009787428192794323
Validation loss = 0.010874943807721138
Validation loss = 0.010453547351062298
Validation loss = 0.014214341528713703
Validation loss = 0.010651580058038235
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011300666257739067
Validation loss = 0.011786084622144699
Validation loss = 0.010242738761007786
Validation loss = 0.010854329913854599
Validation loss = 0.019542254507541656
Validation loss = 0.011579846031963825
Validation loss = 0.016534022986888885
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01382190827280283
Validation loss = 0.012273822911083698
Validation loss = 0.011477494612336159
Validation loss = 0.01104828156530857
Validation loss = 0.010124033316969872
Validation loss = 0.011128007434308529
Validation loss = 0.01596027798950672
Validation loss = 0.012880322523415089
Validation loss = 0.010502723045647144
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009473541751503944
Validation loss = 0.010829854756593704
Validation loss = 0.008718686178326607
Validation loss = 0.009639035910367966
Validation loss = 0.008541671559214592
Validation loss = 0.012996036559343338
Validation loss = 0.013147126883268356
Validation loss = 0.010646846145391464
Validation loss = 0.013230648823082447
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000766 |
| Iteration     | 30        |
| MaximumReturn | -0.000548 |
| MinimumReturn | -0.00105  |
| TotalSamples  | 53312     |
-----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013777703046798706
Validation loss = 0.013460060581564903
Validation loss = 0.010223724879324436
Validation loss = 0.010108815506100655
Validation loss = 0.00943436287343502
Validation loss = 0.010426927357912064
Validation loss = 0.011595035903155804
Validation loss = 0.012131036259233952
Validation loss = 0.008638503029942513
Validation loss = 0.008849122561514378
Validation loss = 0.013678720220923424
Validation loss = 0.014002003706991673
Validation loss = 0.010603389702737331
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009872464463114738
Validation loss = 0.011556467041373253
Validation loss = 0.009801157750189304
Validation loss = 0.011958150193095207
Validation loss = 0.012700900435447693
Validation loss = 0.010094186291098595
Validation loss = 0.009602485224604607
Validation loss = 0.010141711682081223
Validation loss = 0.010950193740427494
Validation loss = 0.01569491997361183
Validation loss = 0.011780031956732273
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015815511345863342
Validation loss = 0.012416651472449303
Validation loss = 0.01346509251743555
Validation loss = 0.011251071467995644
Validation loss = 0.01582101359963417
Validation loss = 0.010795233771204948
Validation loss = 0.011470734141767025
Validation loss = 0.011308814398944378
Validation loss = 0.0115475719794631
Validation loss = 0.010595187544822693
Validation loss = 0.01332816295325756
Validation loss = 0.013716582208871841
Validation loss = 0.010276232846081257
Validation loss = 0.009921414777636528
Validation loss = 0.0123885627835989
Validation loss = 0.014691393822431564
Validation loss = 0.011082598939538002
Validation loss = 0.010038793087005615
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010928764939308167
Validation loss = 0.01110765803605318
Validation loss = 0.0099380724132061
Validation loss = 0.0100813964381814
Validation loss = 0.009403448551893234
Validation loss = 0.010972512885928154
Validation loss = 0.009705073200166225
Validation loss = 0.012810244224965572
Validation loss = 0.012782166711986065
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013334270566701889
Validation loss = 0.011099289171397686
Validation loss = 0.011876781471073627
Validation loss = 0.013006137683987617
Validation loss = 0.010583526454865932
Validation loss = 0.010993918403983116
Validation loss = 0.009807073511183262
Validation loss = 0.010634180158376694
Validation loss = 0.009581816382706165
Validation loss = 0.010871076956391335
Validation loss = 0.009406549856066704
Validation loss = 0.010727412067353725
Validation loss = 0.012874742969870567
Validation loss = 0.009037106297910213
Validation loss = 0.012855554930865765
Validation loss = 0.012728849425911903
Validation loss = 0.009598582051694393
Validation loss = 0.009257848374545574
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000828 |
| Iteration     | 31        |
| MaximumReturn | -0.000603 |
| MinimumReturn | -0.00132  |
| TotalSamples  | 54978     |
-----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010164881125092506
Validation loss = 0.014679823070764542
Validation loss = 0.010196310468018055
Validation loss = 0.009857629425823689
Validation loss = 0.00823649950325489
Validation loss = 0.010018263012170792
Validation loss = 0.011452748440206051
Validation loss = 0.0093458853662014
Validation loss = 0.01091490313410759
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010610884055495262
Validation loss = 0.011130882427096367
Validation loss = 0.01210650335997343
Validation loss = 0.014540071599185467
Validation loss = 0.010572904720902443
Validation loss = 0.010332489386200905
Validation loss = 0.010060776956379414
Validation loss = 0.010363048873841763
Validation loss = 0.014760489575564861
Validation loss = 0.00967345479875803
Validation loss = 0.01015279907733202
Validation loss = 0.009903643280267715
Validation loss = 0.016814790666103363
Validation loss = 0.009750413708388805
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009672815911471844
Validation loss = 0.011985472403466702
Validation loss = 0.00976753979921341
Validation loss = 0.011510866694152355
Validation loss = 0.010643190704286098
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009333939291536808
Validation loss = 0.012913008220493793
Validation loss = 0.011550570838153362
Validation loss = 0.011169874109327793
Validation loss = 0.010858472436666489
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00965905375778675
Validation loss = 0.012647081166505814
Validation loss = 0.010377752594649792
Validation loss = 0.010801583528518677
Validation loss = 0.010635780170559883
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000861 |
| Iteration     | 32        |
| MaximumReturn | -0.000664 |
| MinimumReturn | -0.00103  |
| TotalSamples  | 56644     |
-----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011364377103745937
Validation loss = 0.010136597789824009
Validation loss = 0.012378113344311714
Validation loss = 0.00967609416693449
Validation loss = 0.014103918336331844
Validation loss = 0.01043909601867199
Validation loss = 0.010761463083326817
Validation loss = 0.009341097436845303
Validation loss = 0.009816132485866547
Validation loss = 0.009054026566445827
Validation loss = 0.011852794326841831
Validation loss = 0.00824357196688652
Validation loss = 0.00835429597645998
Validation loss = 0.010305384173989296
Validation loss = 0.010813234373927116
Validation loss = 0.012324965558946133
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009481011889874935
Validation loss = 0.009922646917402744
Validation loss = 0.009314394555985928
Validation loss = 0.012439857237040997
Validation loss = 0.010428836569190025
Validation loss = 0.009534528478980064
Validation loss = 0.009867074899375439
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011597112752497196
Validation loss = 0.011102939955890179
Validation loss = 0.010985116474330425
Validation loss = 0.010115805082023144
Validation loss = 0.012540684081614017
Validation loss = 0.01416725479066372
Validation loss = 0.01155488658696413
Validation loss = 0.01131755206733942
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011105840094387531
Validation loss = 0.01736910082399845
Validation loss = 0.012998206540942192
Validation loss = 0.011769003234803677
Validation loss = 0.010798702947795391
Validation loss = 0.012272366322577
Validation loss = 0.012131421826779842
Validation loss = 0.014318573288619518
Validation loss = 0.010707872919738293
Validation loss = 0.012042677029967308
Validation loss = 0.011231809854507446
Validation loss = 0.012042834423482418
Validation loss = 0.011948500759899616
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015022207982838154
Validation loss = 0.009427986107766628
Validation loss = 0.008794167079031467
Validation loss = 0.011108681559562683
Validation loss = 0.0096943574026227
Validation loss = 0.008920641615986824
Validation loss = 0.009409129619598389
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000826 |
| Iteration     | 33        |
| MaximumReturn | -0.000582 |
| MinimumReturn | -0.00112  |
| TotalSamples  | 58310     |
-----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008375321514904499
Validation loss = 0.01190183311700821
Validation loss = 0.012531774118542671
Validation loss = 0.009544005617499352
Validation loss = 0.009672022424638271
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009973333217203617
Validation loss = 0.009256498888134956
Validation loss = 0.012042584829032421
Validation loss = 0.008718477562069893
Validation loss = 0.010090850293636322
Validation loss = 0.011216223239898682
Validation loss = 0.010033884085714817
Validation loss = 0.008974496275186539
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010999668389558792
Validation loss = 0.012298762798309326
Validation loss = 0.011653261259198189
Validation loss = 0.010657274164259434
Validation loss = 0.011090273037552834
Validation loss = 0.011171482503414154
Validation loss = 0.010829132050275803
Validation loss = 0.010664109140634537
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011595245450735092
Validation loss = 0.012082716450095177
Validation loss = 0.01167351845651865
Validation loss = 0.010474048554897308
Validation loss = 0.009782066568732262
Validation loss = 0.010690546594560146
Validation loss = 0.010326320305466652
Validation loss = 0.014819932170212269
Validation loss = 0.012150358408689499
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009314811788499355
Validation loss = 0.011071248911321163
Validation loss = 0.01080216746777296
Validation loss = 0.009936172515153885
Validation loss = 0.013302695006132126
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000792 |
| Iteration     | 34        |
| MaximumReturn | -0.000567 |
| MinimumReturn | -0.00114  |
| TotalSamples  | 59976     |
-----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012519440613687038
Validation loss = 0.0106035266071558
Validation loss = 0.009372364729642868
Validation loss = 0.009479625150561333
Validation loss = 0.00990780908614397
Validation loss = 0.01567428559064865
Validation loss = 0.009160139597952366
Validation loss = 0.00941870640963316
Validation loss = 0.009134582243859768
Validation loss = 0.009034554474055767
Validation loss = 0.009203331544995308
Validation loss = 0.011117184534668922
Validation loss = 0.009275092743337154
Validation loss = 0.009637197479605675
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009236754849553108
Validation loss = 0.009213034063577652
Validation loss = 0.012752469629049301
Validation loss = 0.01122576929628849
Validation loss = 0.00945335254073143
Validation loss = 0.011807430535554886
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00937220361083746
Validation loss = 0.014839496463537216
Validation loss = 0.015803320333361626
Validation loss = 0.012338760308921337
Validation loss = 0.011214830912649632
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013342497870326042
Validation loss = 0.00969040859490633
Validation loss = 0.01019764319062233
Validation loss = 0.009663757868111134
Validation loss = 0.009386835619807243
Validation loss = 0.013894212432205677
Validation loss = 0.011088068597018719
Validation loss = 0.00973777286708355
Validation loss = 0.01205651555210352
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01233438216149807
Validation loss = 0.010116631165146828
Validation loss = 0.009154918603599072
Validation loss = 0.012271870858967304
Validation loss = 0.013408581726253033
Validation loss = 0.011979549191892147
Validation loss = 0.009779265150427818
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000839 |
| Iteration     | 35        |
| MaximumReturn | -0.000475 |
| MinimumReturn | -0.00131  |
| TotalSamples  | 61642     |
-----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009652425535023212
Validation loss = 0.011233621276915073
Validation loss = 0.011521900072693825
Validation loss = 0.010638361796736717
Validation loss = 0.008847302757203579
Validation loss = 0.009427896700799465
Validation loss = 0.009561881422996521
Validation loss = 0.012686735950410366
Validation loss = 0.009005485102534294
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01209383737295866
Validation loss = 0.009869718924164772
Validation loss = 0.009826545603573322
Validation loss = 0.010588026605546474
Validation loss = 0.011954737827181816
Validation loss = 0.011033948510885239
Validation loss = 0.010128525085747242
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010906347073614597
Validation loss = 0.014571904204785824
Validation loss = 0.012287396006286144
Validation loss = 0.011191928759217262
Validation loss = 0.010265367105603218
Validation loss = 0.010449442081153393
Validation loss = 0.010928281582891941
Validation loss = 0.014472885057330132
Validation loss = 0.010260889306664467
Validation loss = 0.012050991877913475
Validation loss = 0.012094799429178238
Validation loss = 0.012240048497915268
Validation loss = 0.011434974148869514
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01134624145925045
Validation loss = 0.011748133227229118
Validation loss = 0.011195624247193336
Validation loss = 0.010754064656794071
Validation loss = 0.011776567436754704
Validation loss = 0.010601562447845936
Validation loss = 0.01191452145576477
Validation loss = 0.010442826896905899
Validation loss = 0.013942893594503403
Validation loss = 0.011098911054432392
Validation loss = 0.01188065204769373
Validation loss = 0.014698910526931286
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01025279238820076
Validation loss = 0.010712217539548874
Validation loss = 0.010661007836461067
Validation loss = 0.010429156944155693
Validation loss = 0.010644596070051193
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000864 |
| Iteration     | 36        |
| MaximumReturn | -0.00062  |
| MinimumReturn | -0.0014   |
| TotalSamples  | 63308     |
-----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009060810320079327
Validation loss = 0.009095306508243084
Validation loss = 0.008297906257212162
Validation loss = 0.010363009758293629
Validation loss = 0.011326105333864689
Validation loss = 0.009886820800602436
Validation loss = 0.008849621750414371
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00983408559113741
Validation loss = 0.010064810514450073
Validation loss = 0.01143890991806984
Validation loss = 0.009635815396904945
Validation loss = 0.011138547211885452
Validation loss = 0.012578644789755344
Validation loss = 0.014051234349608421
Validation loss = 0.009119834750890732
Validation loss = 0.01726679317653179
Validation loss = 0.010109216906130314
Validation loss = 0.010890753008425236
Validation loss = 0.01143235806375742
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00959544163197279
Validation loss = 0.009853166528046131
Validation loss = 0.011663633398711681
Validation loss = 0.010720365680754185
Validation loss = 0.01204745378345251
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013932864181697369
Validation loss = 0.012264489196240902
Validation loss = 0.010530036874115467
Validation loss = 0.01208068709820509
Validation loss = 0.012253186665475368
Validation loss = 0.01118189375847578
Validation loss = 0.012443061918020248
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009910350665450096
Validation loss = 0.016347697004675865
Validation loss = 0.01052009779959917
Validation loss = 0.010190803557634354
Validation loss = 0.009549890644848347
Validation loss = 0.013781356625258923
Validation loss = 0.009611064568161964
Validation loss = 0.011530888266861439
Validation loss = 0.01083455141633749
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000836 |
| Iteration     | 37        |
| MaximumReturn | -0.000631 |
| MinimumReturn | -0.00106  |
| TotalSamples  | 64974     |
-----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012816679663956165
Validation loss = 0.010659366846084595
Validation loss = 0.010640373453497887
Validation loss = 0.010659879073500633
Validation loss = 0.011241808533668518
Validation loss = 0.009095805697143078
Validation loss = 0.008759531192481518
Validation loss = 0.00918183010071516
Validation loss = 0.008887559175491333
Validation loss = 0.010955600999295712
Validation loss = 0.011142934672534466
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012807255610823631
Validation loss = 0.010326018556952477
Validation loss = 0.01019335724413395
Validation loss = 0.01088759396225214
Validation loss = 0.010107557289302349
Validation loss = 0.009724599309265614
Validation loss = 0.01064997911453247
Validation loss = 0.009525715373456478
Validation loss = 0.008895060047507286
Validation loss = 0.012778887525200844
Validation loss = 0.011467350646853447
Validation loss = 0.009881695732474327
Validation loss = 0.009223699569702148
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010610214434564114
Validation loss = 0.013224603608250618
Validation loss = 0.01103922538459301
Validation loss = 0.010037525556981564
Validation loss = 0.010295070707798004
Validation loss = 0.012904951348900795
Validation loss = 0.011536749079823494
Validation loss = 0.010159377939999104
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012183514423668385
Validation loss = 0.012299465015530586
Validation loss = 0.010675119236111641
Validation loss = 0.010244187898933887
Validation loss = 0.010485620237886906
Validation loss = 0.015188301913440228
Validation loss = 0.014325269497931004
Validation loss = 0.012244138866662979
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009768372401595116
Validation loss = 0.009188500232994556
Validation loss = 0.012269061990082264
Validation loss = 0.008840936236083508
Validation loss = 0.01179356686770916
Validation loss = 0.01005445048213005
Validation loss = 0.00919392891228199
Validation loss = 0.009523574262857437
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000767 |
| Iteration     | 38        |
| MaximumReturn | -0.000583 |
| MinimumReturn | -0.0011   |
| TotalSamples  | 66640     |
-----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009156710468232632
Validation loss = 0.008711363188922405
Validation loss = 0.008290573954582214
Validation loss = 0.013300708495080471
Validation loss = 0.014419482089579105
Validation loss = 0.008786678314208984
Validation loss = 0.008817935362458229
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010097682476043701
Validation loss = 0.014402136206626892
Validation loss = 0.01085386611521244
Validation loss = 0.009771504439413548
Validation loss = 0.00997049268335104
Validation loss = 0.01082694437354803
Validation loss = 0.011104961857199669
Validation loss = 0.01123636681586504
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011864610016345978
Validation loss = 0.010532650165259838
Validation loss = 0.010973865166306496
Validation loss = 0.017028657719492912
Validation loss = 0.011332156136631966
Validation loss = 0.01323949359357357
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010410795919597149
Validation loss = 0.0141668152064085
Validation loss = 0.01952747255563736
Validation loss = 0.010637511499226093
Validation loss = 0.012242071330547333
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009589734487235546
Validation loss = 0.01187862642109394
Validation loss = 0.011407953687012196
Validation loss = 0.014554677531123161
Validation loss = 0.011526374146342278
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000856 |
| Iteration     | 39        |
| MaximumReturn | -0.000585 |
| MinimumReturn | -0.00125  |
| TotalSamples  | 68306     |
-----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009224455803632736
Validation loss = 0.008452259004116058
Validation loss = 0.009117715060710907
Validation loss = 0.009106594137847424
Validation loss = 0.011862270534038544
Validation loss = 0.00896264985203743
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013747231103479862
Validation loss = 0.013870293274521828
Validation loss = 0.011041839607059956
Validation loss = 0.01211854163557291
Validation loss = 0.009881800040602684
Validation loss = 0.009938381612300873
Validation loss = 0.009692213498055935
Validation loss = 0.0101505471393466
Validation loss = 0.009904053062200546
Validation loss = 0.009999659843742847
Validation loss = 0.017105313017964363
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012871484272181988
Validation loss = 0.013052535243332386
Validation loss = 0.012856878340244293
Validation loss = 0.011108986102044582
Validation loss = 0.010581942275166512
Validation loss = 0.009841478429734707
Validation loss = 0.011372020468115807
Validation loss = 0.01769794151186943
Validation loss = 0.013181096874177456
Validation loss = 0.009988958947360516
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011259024031460285
Validation loss = 0.009981244802474976
Validation loss = 0.011553443968296051
Validation loss = 0.011971056461334229
Validation loss = 0.009948940947651863
Validation loss = 0.011640110053122044
Validation loss = 0.011801482178270817
Validation loss = 0.01023185532540083
Validation loss = 0.012469150125980377
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010358696803450584
Validation loss = 0.010105651803314686
Validation loss = 0.016527889296412468
Validation loss = 0.010783948935568333
Validation loss = 0.010316863656044006
Validation loss = 0.01192526239901781
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000878 |
| Iteration     | 40        |
| MaximumReturn | -0.000558 |
| MinimumReturn | -0.00131  |
| TotalSamples  | 69972     |
-----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008557120338082314
Validation loss = 0.0120396688580513
Validation loss = 0.008612437173724174
Validation loss = 0.009264351800084114
Validation loss = 0.00866189505904913
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011737666092813015
Validation loss = 0.009844794869422913
Validation loss = 0.009626534767448902
Validation loss = 0.01013051625341177
Validation loss = 0.009430753998458385
Validation loss = 0.009112128987908363
Validation loss = 0.014918559230864048
Validation loss = 0.009831948205828667
Validation loss = 0.012151271104812622
Validation loss = 0.010730454698204994
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010102516040205956
Validation loss = 0.011181699112057686
Validation loss = 0.010896778665482998
Validation loss = 0.013119122944772243
Validation loss = 0.01017764862626791
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01118688564747572
Validation loss = 0.014735739678144455
Validation loss = 0.011638800613582134
Validation loss = 0.012618562206625938
Validation loss = 0.01154404692351818
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010707542300224304
Validation loss = 0.01157205831259489
Validation loss = 0.011416718363761902
Validation loss = 0.018341556191444397
Validation loss = 0.011630089022219181
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000782 |
| Iteration     | 41        |
| MaximumReturn | -0.000557 |
| MinimumReturn | -0.00101  |
| TotalSamples  | 71638     |
-----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009031429886817932
Validation loss = 0.011884504929184914
Validation loss = 0.011020542122423649
Validation loss = 0.009502300061285496
Validation loss = 0.010515278205275536
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010571385733783245
Validation loss = 0.009873848408460617
Validation loss = 0.014775208197534084
Validation loss = 0.009780976921319962
Validation loss = 0.010720127262175083
Validation loss = 0.009276545606553555
Validation loss = 0.010010032914578915
Validation loss = 0.01320179458707571
Validation loss = 0.013019288890063763
Validation loss = 0.011397852562367916
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010387352667748928
Validation loss = 0.010519236326217651
Validation loss = 0.010847816243767738
Validation loss = 0.010483265854418278
Validation loss = 0.011872689239680767
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009951298125088215
Validation loss = 0.009554506279528141
Validation loss = 0.009916817769408226
Validation loss = 0.013272864744067192
Validation loss = 0.013935203664004803
Validation loss = 0.010476011782884598
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01174687035381794
Validation loss = 0.01004925649613142
Validation loss = 0.012141210027039051
Validation loss = 0.011944886296987534
Validation loss = 0.010641519911587238
Validation loss = 0.010861316695809364
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000824 |
| Iteration     | 42        |
| MaximumReturn | -0.00052  |
| MinimumReturn | -0.00107  |
| TotalSamples  | 73304     |
-----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010365106165409088
Validation loss = 0.014004303142428398
Validation loss = 0.008531463332474232
Validation loss = 0.008918438106775284
Validation loss = 0.008724276907742023
Validation loss = 0.008755000308156013
Validation loss = 0.010960583575069904
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011527441442012787
Validation loss = 0.014119661413133144
Validation loss = 0.00939770694822073
Validation loss = 0.014379718340933323
Validation loss = 0.009594907984137535
Validation loss = 0.011857541278004646
Validation loss = 0.011142558418214321
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012897588312625885
Validation loss = 0.010837439447641373
Validation loss = 0.010190214030444622
Validation loss = 0.01466322410851717
Validation loss = 0.01129587646573782
Validation loss = 0.011052442714571953
Validation loss = 0.011305650696158409
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011251574382185936
Validation loss = 0.010389291681349277
Validation loss = 0.011331360787153244
Validation loss = 0.009929724968969822
Validation loss = 0.01206044852733612
Validation loss = 0.013210906647145748
Validation loss = 0.010787036269903183
Validation loss = 0.010583190247416496
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01224359218031168
Validation loss = 0.009469489566981792
Validation loss = 0.010150551795959473
Validation loss = 0.011867445893585682
Validation loss = 0.010887373238801956
Validation loss = 0.010285791009664536
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000785 |
| Iteration     | 43        |
| MaximumReturn | -0.000601 |
| MinimumReturn | -0.00117  |
| TotalSamples  | 74970     |
-----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010279595851898193
Validation loss = 0.00968188140541315
Validation loss = 0.009416776709258556
Validation loss = 0.00916450098156929
Validation loss = 0.013308623805642128
Validation loss = 0.008398926816880703
Validation loss = 0.008169826120138168
Validation loss = 0.013254028744995594
Validation loss = 0.008508212864398956
Validation loss = 0.008567482233047485
Validation loss = 0.00869178306311369
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01019310113042593
Validation loss = 0.009566538967192173
Validation loss = 0.008892250247299671
Validation loss = 0.014534917660057545
Validation loss = 0.010269513353705406
Validation loss = 0.00994100421667099
Validation loss = 0.009286557324230671
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013329141773283482
Validation loss = 0.010100850835442543
Validation loss = 0.010545758530497551
Validation loss = 0.01002937275916338
Validation loss = 0.010198424570262432
Validation loss = 0.011532696895301342
Validation loss = 0.015470617450773716
Validation loss = 0.01323995366692543
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012873390689492226
Validation loss = 0.013037653639912605
Validation loss = 0.011253057047724724
Validation loss = 0.010419532656669617
Validation loss = 0.010172502137720585
Validation loss = 0.01033247821033001
Validation loss = 0.008978957310318947
Validation loss = 0.014813587069511414
Validation loss = 0.00970203336328268
Validation loss = 0.009943774901330471
Validation loss = 0.009801975451409817
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011603674851357937
Validation loss = 0.01092055905610323
Validation loss = 0.012056495994329453
Validation loss = 0.010160907171666622
Validation loss = 0.010827389545738697
Validation loss = 0.013515648432075977
Validation loss = 0.013375668786466122
Validation loss = 0.010387164540588856
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000851 |
| Iteration     | 44        |
| MaximumReturn | -0.000583 |
| MinimumReturn | -0.00107  |
| TotalSamples  | 76636     |
-----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008753064088523388
Validation loss = 0.008527166210114956
Validation loss = 0.010615693405270576
Validation loss = 0.009153644554316998
Validation loss = 0.007707234937697649
Validation loss = 0.009239472448825836
Validation loss = 0.009577847085893154
Validation loss = 0.008620010688900948
Validation loss = 0.009566599503159523
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009965707547962666
Validation loss = 0.009635875932872295
Validation loss = 0.012134318239986897
Validation loss = 0.009763669222593307
Validation loss = 0.009569038636982441
Validation loss = 0.01202408131211996
Validation loss = 0.010844610631465912
Validation loss = 0.013059769757091999
Validation loss = 0.009369876235723495
Validation loss = 0.010428040288388729
Validation loss = 0.009525294415652752
Validation loss = 0.011148197576403618
Validation loss = 0.01004707533866167
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011080014519393444
Validation loss = 0.010822650976479053
Validation loss = 0.010030217468738556
Validation loss = 0.012299731373786926
Validation loss = 0.012402701191604137
Validation loss = 0.010065470822155476
Validation loss = 0.010443367063999176
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010024500079452991
Validation loss = 0.015404080040752888
Validation loss = 0.010747648775577545
Validation loss = 0.009389128535985947
Validation loss = 0.009471738710999489
Validation loss = 0.011351623572409153
Validation loss = 0.009521514177322388
Validation loss = 0.010095320641994476
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010482130572199821
Validation loss = 0.010532031767070293
Validation loss = 0.01209502387791872
Validation loss = 0.01082236785441637
Validation loss = 0.010180996730923653
Validation loss = 0.00955508928745985
Validation loss = 0.009336285293102264
Validation loss = 0.009181474335491657
Validation loss = 0.01119075994938612
Validation loss = 0.011846582405269146
Validation loss = 0.010666601359844208
Validation loss = 0.010640060529112816
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000793 |
| Iteration     | 45        |
| MaximumReturn | -0.000539 |
| MinimumReturn | -0.00108  |
| TotalSamples  | 78302     |
-----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008498595096170902
Validation loss = 0.010092956013977528
Validation loss = 0.009027570486068726
Validation loss = 0.008300397545099258
Validation loss = 0.00850857887417078
Validation loss = 0.00937352329492569
Validation loss = 0.009534469805657864
Validation loss = 0.008376467041671276
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011277398094534874
Validation loss = 0.011107414029538631
Validation loss = 0.00952146016061306
Validation loss = 0.009570891037583351
Validation loss = 0.008814369328320026
Validation loss = 0.018851974979043007
Validation loss = 0.008952570147812366
Validation loss = 0.01264727022498846
Validation loss = 0.009658492170274258
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010103763081133366
Validation loss = 0.010494743473827839
Validation loss = 0.013330802321434021
Validation loss = 0.01015473436564207
Validation loss = 0.009319298900663853
Validation loss = 0.010511389002203941
Validation loss = 0.011116470210254192
Validation loss = 0.010427603498101234
Validation loss = 0.01346491277217865
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00970802828669548
Validation loss = 0.008946423418819904
Validation loss = 0.012089666910469532
Validation loss = 0.010327913798391819
Validation loss = 0.010871217586100101
Validation loss = 0.01037334743887186
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01026788167655468
Validation loss = 0.012205272912979126
Validation loss = 0.010051354765892029
Validation loss = 0.009862189181149006
Validation loss = 0.011093729175627232
Validation loss = 0.011944572441279888
Validation loss = 0.0098459143191576
Validation loss = 0.009602404199540615
Validation loss = 0.01417957991361618
Validation loss = 0.008667836897075176
Validation loss = 0.010413740761578083
Validation loss = 0.010027673095464706
Validation loss = 0.009348928928375244
Validation loss = 0.009847513400018215
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000807 |
| Iteration     | 46        |
| MaximumReturn | -0.000606 |
| MinimumReturn | -0.00111  |
| TotalSamples  | 79968     |
-----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008260203525424004
Validation loss = 0.009773737750947475
Validation loss = 0.009079030714929104
Validation loss = 0.00818831566721201
Validation loss = 0.00831088237464428
Validation loss = 0.00831204280257225
Validation loss = 0.00835810974240303
Validation loss = 0.008501401171088219
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009237168356776237
Validation loss = 0.00912985298782587
Validation loss = 0.010502241551876068
Validation loss = 0.01001817174255848
Validation loss = 0.009443214163184166
Validation loss = 0.009433812461793423
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01192200742661953
Validation loss = 0.010577305220067501
Validation loss = 0.010785488411784172
Validation loss = 0.010631456971168518
Validation loss = 0.01112331636250019
Validation loss = 0.009776106104254723
Validation loss = 0.012450123205780983
Validation loss = 0.011279726400971413
Validation loss = 0.013699853792786598
Validation loss = 0.009770874865353107
Validation loss = 0.010071126744151115
Validation loss = 0.009634682908654213
Validation loss = 0.009822449646890163
Validation loss = 0.01085628941655159
Validation loss = 0.010336318984627724
Validation loss = 0.011338181793689728
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010331820696592331
Validation loss = 0.01193323452025652
Validation loss = 0.013447055593132973
Validation loss = 0.012638231739401817
Validation loss = 0.01078020315617323
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011200823821127415
Validation loss = 0.010983224026858807
Validation loss = 0.01099806372076273
Validation loss = 0.009371846914291382
Validation loss = 0.009857190772891045
Validation loss = 0.008945612236857414
Validation loss = 0.017715416848659515
Validation loss = 0.0107400082051754
Validation loss = 0.008746137842535973
Validation loss = 0.010457615368068218
Validation loss = 0.010172705166041851
Validation loss = 0.01021269429475069
Validation loss = 0.011508530005812645
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000865 |
| Iteration     | 47        |
| MaximumReturn | -0.000633 |
| MinimumReturn | -0.00141  |
| TotalSamples  | 81634     |
-----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009444224648177624
Validation loss = 0.008205803111195564
Validation loss = 0.012453616596758366
Validation loss = 0.008891655132174492
Validation loss = 0.008284350857138634
Validation loss = 0.008916178718209267
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011579359881579876
Validation loss = 0.009357921779155731
Validation loss = 0.009622573852539062
Validation loss = 0.009814768098294735
Validation loss = 0.010146766901016235
Validation loss = 0.009709933772683144
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010615723207592964
Validation loss = 0.012079130858182907
Validation loss = 0.011793490499258041
Validation loss = 0.010386144742369652
Validation loss = 0.016034681349992752
Validation loss = 0.010592587292194366
Validation loss = 0.01000974141061306
Validation loss = 0.011261708103120327
Validation loss = 0.01208213996142149
Validation loss = 0.014098778367042542
Validation loss = 0.01353173702955246
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011435222811996937
Validation loss = 0.012560036964714527
Validation loss = 0.01077260822057724
Validation loss = 0.011857307516038418
Validation loss = 0.011958805844187737
Validation loss = 0.010355946607887745
Validation loss = 0.012055875733494759
Validation loss = 0.011153468862175941
Validation loss = 0.012964166700839996
Validation loss = 0.011031062342226505
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014970737509429455
Validation loss = 0.010303303599357605
Validation loss = 0.008887670934200287
Validation loss = 0.011249924078583717
Validation loss = 0.010379059240221977
Validation loss = 0.009122004732489586
Validation loss = 0.010089273564517498
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000793 |
| Iteration     | 48        |
| MaximumReturn | -0.000476 |
| MinimumReturn | -0.00102  |
| TotalSamples  | 83300     |
-----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007978573441505432
Validation loss = 0.014235260896384716
Validation loss = 0.007857928052544594
Validation loss = 0.012235688045620918
Validation loss = 0.009621080942451954
Validation loss = 0.008154632523655891
Validation loss = 0.010194718837738037
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00943896733224392
Validation loss = 0.013293877243995667
Validation loss = 0.013041951693594456
Validation loss = 0.009870165027678013
Validation loss = 0.011537888087332249
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012075761333107948
Validation loss = 0.012274651788175106
Validation loss = 0.011935791932046413
Validation loss = 0.010670505464076996
Validation loss = 0.010681431740522385
Validation loss = 0.009922289289534092
Validation loss = 0.010630706325173378
Validation loss = 0.010383211076259613
Validation loss = 0.011818711645901203
Validation loss = 0.011205632239580154
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012164746411144733
Validation loss = 0.010680120438337326
Validation loss = 0.011663646437227726
Validation loss = 0.012726468034088612
Validation loss = 0.00957252737134695
Validation loss = 0.010247630998492241
Validation loss = 0.010453182272613049
Validation loss = 0.010819521732628345
Validation loss = 0.009767805226147175
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010473040863871574
Validation loss = 0.010119060054421425
Validation loss = 0.011074838228523731
Validation loss = 0.009637506678700447
Validation loss = 0.014038399793207645
Validation loss = 0.013113109394907951
Validation loss = 0.011377756483852863
Validation loss = 0.009649744257330894
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00088  |
| Iteration     | 49        |
| MaximumReturn | -0.000594 |
| MinimumReturn | -0.00151  |
| TotalSamples  | 84966     |
-----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011874891817569733
Validation loss = 0.009019320830702782
Validation loss = 0.00861121341586113
Validation loss = 0.008256380446255207
Validation loss = 0.008498508483171463
Validation loss = 0.009249089285731316
Validation loss = 0.008247481659054756
Validation loss = 0.010849167592823505
Validation loss = 0.008884098380804062
Validation loss = 0.01129566878080368
Validation loss = 0.007804755121469498
Validation loss = 0.007877906784415245
Validation loss = 0.010587403550744057
Validation loss = 0.009568331763148308
Validation loss = 0.008490235544741154
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009454076178371906
Validation loss = 0.009691104292869568
Validation loss = 0.013142019510269165
Validation loss = 0.009749344550073147
Validation loss = 0.009675094857811928
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011250792071223259
Validation loss = 0.01033265981823206
Validation loss = 0.015901125967502594
Validation loss = 0.010781923308968544
Validation loss = 0.014039968140423298
Validation loss = 0.010410677641630173
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01456985343247652
Validation loss = 0.01055707037448883
Validation loss = 0.010305319912731647
Validation loss = 0.010655932128429413
Validation loss = 0.013460059650242329
Validation loss = 0.013778943568468094
Validation loss = 0.010090850293636322
Validation loss = 0.012059085071086884
Validation loss = 0.011954153887927532
Validation loss = 0.009680748917162418
Validation loss = 0.009773320518434048
Validation loss = 0.009633858688175678
Validation loss = 0.011012600734829903
Validation loss = 0.010827546007931232
Validation loss = 0.012020484544336796
Validation loss = 0.017164573073387146
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00930254627019167
Validation loss = 0.009186639450490475
Validation loss = 0.009452722035348415
Validation loss = 0.010243241675198078
Validation loss = 0.00979811791330576
Validation loss = 0.010966219939291477
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000821 |
| Iteration     | 50        |
| MaximumReturn | -0.000628 |
| MinimumReturn | -0.00111  |
| TotalSamples  | 86632     |
-----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0094860615208745
Validation loss = 0.009574898518621922
Validation loss = 0.008557446300983429
Validation loss = 0.009088809601962566
Validation loss = 0.01086992397904396
Validation loss = 0.009371060878038406
Validation loss = 0.00892596784979105
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009892097674310207
Validation loss = 0.011299503967165947
Validation loss = 0.009392005391418934
Validation loss = 0.011316321790218353
Validation loss = 0.008656454272568226
Validation loss = 0.011709398590028286
Validation loss = 0.010400003753602505
Validation loss = 0.010151318274438381
Validation loss = 0.010437922552227974
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0103506650775671
Validation loss = 0.010602687485516071
Validation loss = 0.013001770712435246
Validation loss = 0.009889067150652409
Validation loss = 0.010461481288075447
Validation loss = 0.01007409580051899
Validation loss = 0.01012754812836647
Validation loss = 0.009744384326040745
Validation loss = 0.010041152127087116
Validation loss = 0.011795355938374996
Validation loss = 0.012775178998708725
Validation loss = 0.010611230507493019
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016494672745466232
Validation loss = 0.01100471243262291
Validation loss = 0.013374033384025097
Validation loss = 0.009673438034951687
Validation loss = 0.009671301580965519
Validation loss = 0.010284327901899815
Validation loss = 0.00919344648718834
Validation loss = 0.0124690355733037
Validation loss = 0.009314745664596558
Validation loss = 0.008599448017776012
Validation loss = 0.01280911173671484
Validation loss = 0.009223305620253086
Validation loss = 0.0106881782412529
Validation loss = 0.013143573887646198
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009878134354948997
Validation loss = 0.009622505865991116
Validation loss = 0.008114214986562729
Validation loss = 0.013039257377386093
Validation loss = 0.009698254987597466
Validation loss = 0.00993965845555067
Validation loss = 0.00981114525347948
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00084  |
| Iteration     | 51        |
| MaximumReturn | -0.000633 |
| MinimumReturn | -0.00116  |
| TotalSamples  | 88298     |
-----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011358950287103653
Validation loss = 0.0114672239869833
Validation loss = 0.011789407581090927
Validation loss = 0.009403861127793789
Validation loss = 0.01364186406135559
Validation loss = 0.009332198649644852
Validation loss = 0.009024822153151035
Validation loss = 0.007696501445025206
Validation loss = 0.011011689901351929
Validation loss = 0.012039312161505222
Validation loss = 0.010183687321841717
Validation loss = 0.01619783788919449
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009500392712652683
Validation loss = 0.012220538221299648
Validation loss = 0.010837068781256676
Validation loss = 0.010158644989132881
Validation loss = 0.009180646389722824
Validation loss = 0.012577864341437817
Validation loss = 0.01287040114402771
Validation loss = 0.009735227562487125
Validation loss = 0.010339612141251564
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010491990484297276
Validation loss = 0.012263053096830845
Validation loss = 0.010252955369651318
Validation loss = 0.015118272043764591
Validation loss = 0.011531125754117966
Validation loss = 0.010228646919131279
Validation loss = 0.010727598331868649
Validation loss = 0.010766054503619671
Validation loss = 0.010763074271380901
Validation loss = 0.0099759167060256
Validation loss = 0.01040534395724535
Validation loss = 0.011879724450409412
Validation loss = 0.010476237162947655
Validation loss = 0.011515986174345016
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014521575532853603
Validation loss = 0.010743229649960995
Validation loss = 0.011202903464436531
Validation loss = 0.015082847326993942
Validation loss = 0.008821384981274605
Validation loss = 0.013533785939216614
Validation loss = 0.01129394955933094
Validation loss = 0.01068707462400198
Validation loss = 0.009602289646863937
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00941244326531887
Validation loss = 0.007995057851076126
Validation loss = 0.009175555780529976
Validation loss = 0.009919474832713604
Validation loss = 0.009065750055015087
Validation loss = 0.009029564447700977
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000849 |
| Iteration     | 52        |
| MaximumReturn | -0.000551 |
| MinimumReturn | -0.00145  |
| TotalSamples  | 89964     |
-----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011194031685590744
Validation loss = 0.00891341082751751
Validation loss = 0.009481437504291534
Validation loss = 0.00912274606525898
Validation loss = 0.011596824042499065
Validation loss = 0.009219331666827202
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010880150832235813
Validation loss = 0.011804687790572643
Validation loss = 0.010404625907540321
Validation loss = 0.009749858640134335
Validation loss = 0.009952370077371597
Validation loss = 0.013693677261471748
Validation loss = 0.010660734958946705
Validation loss = 0.009592871181666851
Validation loss = 0.01028716191649437
Validation loss = 0.00959298387169838
Validation loss = 0.009376898407936096
Validation loss = 0.013623887673020363
Validation loss = 0.009977725334465504
Validation loss = 0.010905729606747627
Validation loss = 0.009300028905272484
Validation loss = 0.010390110313892365
Validation loss = 0.010195375420153141
Validation loss = 0.00997957680374384
Validation loss = 0.011156050488352776
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010542779229581356
Validation loss = 0.010835384950041771
Validation loss = 0.00957073736935854
Validation loss = 0.010289429686963558
Validation loss = 0.01044528465718031
Validation loss = 0.015684613958001137
Validation loss = 0.01037439052015543
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011337359435856342
Validation loss = 0.011621369048953056
Validation loss = 0.010759200900793076
Validation loss = 0.0094778286293149
Validation loss = 0.009911085478961468
Validation loss = 0.00947162602096796
Validation loss = 0.009541741572320461
Validation loss = 0.009646397083997726
Validation loss = 0.00897874403744936
Validation loss = 0.01101331040263176
Validation loss = 0.009331445209681988
Validation loss = 0.00950680673122406
Validation loss = 0.016721775755286217
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009704799391329288
Validation loss = 0.013601540587842464
Validation loss = 0.008566390722990036
Validation loss = 0.009541505016386509
Validation loss = 0.009326066821813583
Validation loss = 0.01015488337725401
Validation loss = 0.008888120763003826
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000825 |
| Iteration     | 53        |
| MaximumReturn | -0.000679 |
| MinimumReturn | -0.00129  |
| TotalSamples  | 91630     |
-----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009563568979501724
Validation loss = 0.009640366770327091
Validation loss = 0.00966500025242567
Validation loss = 0.0120398523285985
Validation loss = 0.008599449880421162
Validation loss = 0.008611364290118217
Validation loss = 0.009067420847713947
Validation loss = 0.011221129447221756
Validation loss = 0.00936947576701641
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011536968871951103
Validation loss = 0.010273274034261703
Validation loss = 0.013654709793627262
Validation loss = 0.010072298347949982
Validation loss = 0.01009974256157875
Validation loss = 0.015084462240338326
Validation loss = 0.015015424229204655
Validation loss = 0.01008102297782898
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009885674342513084
Validation loss = 0.010433996096253395
Validation loss = 0.010572250932455063
Validation loss = 0.01451443787664175
Validation loss = 0.01221964880824089
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014107952825725079
Validation loss = 0.009614654816687107
Validation loss = 0.011491881683468819
Validation loss = 0.009129472076892853
Validation loss = 0.009798266924917698
Validation loss = 0.00924103707075119
Validation loss = 0.008409990929067135
Validation loss = 0.012197422794997692
Validation loss = 0.010177208110690117
Validation loss = 0.009970370680093765
Validation loss = 0.015401558019220829
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009547563269734383
Validation loss = 0.011014888063073158
Validation loss = 0.01011461392045021
Validation loss = 0.010101775638759136
Validation loss = 0.010002975352108479
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000895 |
| Iteration     | 54        |
| MaximumReturn | -0.000641 |
| MinimumReturn | -0.00125  |
| TotalSamples  | 93296     |
-----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008871463127434254
Validation loss = 0.008648603223264217
Validation loss = 0.009832716546952724
Validation loss = 0.011768282391130924
Validation loss = 0.010422121733427048
Validation loss = 0.009282468818128109
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009754246100783348
Validation loss = 0.009264432825148106
Validation loss = 0.009521909058094025
Validation loss = 0.00993407890200615
Validation loss = 0.00950334221124649
Validation loss = 0.009252125397324562
Validation loss = 0.008755487389862537
Validation loss = 0.01033052708953619
Validation loss = 0.00983406975865364
Validation loss = 0.009239927865564823
Validation loss = 0.009851762093603611
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011904532089829445
Validation loss = 0.010861334390938282
Validation loss = 0.010484117083251476
Validation loss = 0.01141238771378994
Validation loss = 0.01089794747531414
Validation loss = 0.011825035326182842
Validation loss = 0.010245167650282383
Validation loss = 0.010756738483905792
Validation loss = 0.010996156372129917
Validation loss = 0.009182089939713478
Validation loss = 0.010321119800209999
Validation loss = 0.009749787859618664
Validation loss = 0.009751863777637482
Validation loss = 0.010969295166432858
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01063577551394701
Validation loss = 0.00969452504068613
Validation loss = 0.012630276381969452
Validation loss = 0.009864804334938526
Validation loss = 0.011637454852461815
Validation loss = 0.011054383590817451
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010148296132683754
Validation loss = 0.009193070232868195
Validation loss = 0.010542983189225197
Validation loss = 0.009429603815078735
Validation loss = 0.00904069934040308
Validation loss = 0.011113586835563183
Validation loss = 0.010106895118951797
Validation loss = 0.010884555988013744
Validation loss = 0.010668913833796978
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.001    |
| Iteration     | 55        |
| MaximumReturn | -0.000546 |
| MinimumReturn | -0.00216  |
| TotalSamples  | 94962     |
-----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009026366285979748
Validation loss = 0.009490936063230038
Validation loss = 0.008670778945088387
Validation loss = 0.009677601978182793
Validation loss = 0.009461835026741028
Validation loss = 0.008507481776177883
Validation loss = 0.010605576448142529
Validation loss = 0.010094132274389267
Validation loss = 0.013127447105944157
Validation loss = 0.008555151522159576
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009273601695895195
Validation loss = 0.009150470606982708
Validation loss = 0.01140726637095213
Validation loss = 0.009694102220237255
Validation loss = 0.010754785500466824
Validation loss = 0.011754455976188183
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010423828847706318
Validation loss = 0.010161109268665314
Validation loss = 0.009780787862837315
Validation loss = 0.010202361270785332
Validation loss = 0.010357104241847992
Validation loss = 0.010370682924985886
Validation loss = 0.010523936711251736
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010355968959629536
Validation loss = 0.009607970714569092
Validation loss = 0.010922340676188469
Validation loss = 0.00991730485111475
Validation loss = 0.012478310614824295
Validation loss = 0.012666221708059311
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010999978519976139
Validation loss = 0.011776478961110115
Validation loss = 0.011262722313404083
Validation loss = 0.010941484943032265
Validation loss = 0.013521009124815464
Validation loss = 0.01014171913266182
Validation loss = 0.012510065920650959
Validation loss = 0.009870419278740883
Validation loss = 0.010415229946374893
Validation loss = 0.009799951687455177
Validation loss = 0.010754350572824478
Validation loss = 0.013550224713981152
Validation loss = 0.010682063177227974
Validation loss = 0.011286035180091858
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000934 |
| Iteration     | 56        |
| MaximumReturn | -0.000707 |
| MinimumReturn | -0.00118  |
| TotalSamples  | 96628     |
-----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009519525803625584
Validation loss = 0.010562754236161709
Validation loss = 0.008729296736419201
Validation loss = 0.00904749520123005
Validation loss = 0.016367768868803978
Validation loss = 0.00861484743654728
Validation loss = 0.009808329865336418
Validation loss = 0.009470527060329914
Validation loss = 0.01036675926297903
Validation loss = 0.010170936584472656
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009789489209651947
Validation loss = 0.00990206841379404
Validation loss = 0.01390648353844881
Validation loss = 0.010089951567351818
Validation loss = 0.00942462682723999
Validation loss = 0.009735425934195518
Validation loss = 0.009219133295118809
Validation loss = 0.01003390271216631
Validation loss = 0.009435496293008327
Validation loss = 0.012097124010324478
Validation loss = 0.009941421449184418
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012244581244885921
Validation loss = 0.009992812760174274
Validation loss = 0.008791613392531872
Validation loss = 0.010514370165765285
Validation loss = 0.009822684340178967
Validation loss = 0.011867150664329529
Validation loss = 0.012041550129652023
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01244424656033516
Validation loss = 0.009299913421273232
Validation loss = 0.013988986611366272
Validation loss = 0.010721631348133087
Validation loss = 0.008673841133713722
Validation loss = 0.011152000166475773
Validation loss = 0.010714163072407246
Validation loss = 0.01231679692864418
Validation loss = 0.009568561799824238
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014303654432296753
Validation loss = 0.011148692108690739
Validation loss = 0.015413016080856323
Validation loss = 0.009771178476512432
Validation loss = 0.008866745047271252
Validation loss = 0.009680983610451221
Validation loss = 0.0103341368958354
Validation loss = 0.013736377470195293
Validation loss = 0.00930157769471407
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000864 |
| Iteration     | 57        |
| MaximumReturn | -0.000576 |
| MinimumReturn | -0.00132  |
| TotalSamples  | 98294     |
-----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008768165484070778
Validation loss = 0.01180258672684431
Validation loss = 0.008584387600421906
Validation loss = 0.01019645668566227
Validation loss = 0.008635565638542175
Validation loss = 0.012557064183056355
Validation loss = 0.008311446756124496
Validation loss = 0.01155998557806015
Validation loss = 0.008016194216907024
Validation loss = 0.008619110099971294
Validation loss = 0.010637795552611351
Validation loss = 0.008187985979020596
Validation loss = 0.008594517596065998
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010860403068363667
Validation loss = 0.010699672624468803
Validation loss = 0.010378364473581314
Validation loss = 0.014408663846552372
Validation loss = 0.009641867130994797
Validation loss = 0.009376395493745804
Validation loss = 0.010128811001777649
Validation loss = 0.01031029224395752
Validation loss = 0.01000046543776989
Validation loss = 0.011004677973687649
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010481614619493484
Validation loss = 0.009826229885220528
Validation loss = 0.009123953059315681
Validation loss = 0.013218403793871403
Validation loss = 0.011194347403943539
Validation loss = 0.010513775050640106
Validation loss = 0.0112763037905097
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00911679957062006
Validation loss = 0.008828869089484215
Validation loss = 0.008791293017566204
Validation loss = 0.008626903407275677
Validation loss = 0.009075669571757317
Validation loss = 0.009851565584540367
Validation loss = 0.010141550563275814
Validation loss = 0.011498299427330494
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009089624509215355
Validation loss = 0.009221207350492477
Validation loss = 0.010754011571407318
Validation loss = 0.00986550748348236
Validation loss = 0.008945023640990257
Validation loss = 0.013999725691974163
Validation loss = 0.009905839338898659
Validation loss = 0.010612593963742256
Validation loss = 0.011721952818334103
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000819 |
| Iteration     | 58        |
| MaximumReturn | -0.000563 |
| MinimumReturn | -0.00108  |
| TotalSamples  | 99960     |
-----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008361242711544037
Validation loss = 0.00898179691284895
Validation loss = 0.008707216009497643
Validation loss = 0.010371302254498005
Validation loss = 0.008309075608849525
Validation loss = 0.008699108846485615
Validation loss = 0.008378528989851475
Validation loss = 0.008607422932982445
Validation loss = 0.007908162660896778
Validation loss = 0.008450930006802082
Validation loss = 0.008633050136268139
Validation loss = 0.008705133572220802
Validation loss = 0.008987871930003166
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017398810014128685
Validation loss = 0.010031362995505333
Validation loss = 0.009727677330374718
Validation loss = 0.011361806653439999
Validation loss = 0.010853606276214123
Validation loss = 0.00972069427371025
Validation loss = 0.011491885408759117
Validation loss = 0.009192130528390408
Validation loss = 0.014079629443585873
Validation loss = 0.010532179847359657
Validation loss = 0.009697504341602325
Validation loss = 0.00899322610348463
Validation loss = 0.009227660484611988
Validation loss = 0.01041858084499836
Validation loss = 0.009210960008203983
Validation loss = 0.009677384980022907
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009950274601578712
Validation loss = 0.009925520978868008
Validation loss = 0.009971199557185173
Validation loss = 0.010695010423660278
Validation loss = 0.009911362081766129
Validation loss = 0.008993356488645077
Validation loss = 0.009674368426203728
Validation loss = 0.012232000939548016
Validation loss = 0.009768559597432613
Validation loss = 0.010377070866525173
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014061903581023216
Validation loss = 0.011024169623851776
Validation loss = 0.009352725930511951
Validation loss = 0.008864940144121647
Validation loss = 0.013352795504033566
Validation loss = 0.009273667819797993
Validation loss = 0.01051273848861456
Validation loss = 0.009179757907986641
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008952195756137371
Validation loss = 0.0101408576592803
Validation loss = 0.009700763039290905
Validation loss = 0.00900357961654663
Validation loss = 0.009273343719542027
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000816 |
| Iteration     | 59        |
| MaximumReturn | -0.000576 |
| MinimumReturn | -0.00108  |
| TotalSamples  | 101626    |
-----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009437089785933495
Validation loss = 0.013675923459231853
Validation loss = 0.008333990350365639
Validation loss = 0.008721116930246353
Validation loss = 0.010558661073446274
Validation loss = 0.011616243049502373
Validation loss = 0.010426828637719154
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00935729406774044
Validation loss = 0.01031914260238409
Validation loss = 0.01011764258146286
Validation loss = 0.011421630159020424
Validation loss = 0.009555396623909473
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009270301088690758
Validation loss = 0.010795972310006618
Validation loss = 0.009755871258676052
Validation loss = 0.009935788810253143
Validation loss = 0.009875921532511711
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009784968569874763
Validation loss = 0.010219034738838673
Validation loss = 0.012269976548850536
Validation loss = 0.010466087609529495
Validation loss = 0.010638036765158176
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011973452754318714
Validation loss = 0.014269275590777397
Validation loss = 0.009523737244307995
Validation loss = 0.009235169738531113
Validation loss = 0.00928560271859169
Validation loss = 0.012048913165926933
Validation loss = 0.009337658993899822
Validation loss = 0.008934860117733479
Validation loss = 0.011362717486917973
Validation loss = 0.008935445919632912
Validation loss = 0.008935562334954739
Validation loss = 0.016300173476338387
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000902 |
| Iteration     | 60        |
| MaximumReturn | -0.000668 |
| MinimumReturn | -0.00115  |
| TotalSamples  | 103292    |
-----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00929932575672865
Validation loss = 0.009601001627743244
Validation loss = 0.009150088764727116
Validation loss = 0.00936188455671072
Validation loss = 0.009921371936798096
Validation loss = 0.00865340605378151
Validation loss = 0.008303550072014332
Validation loss = 0.009562874212861061
Validation loss = 0.007990138605237007
Validation loss = 0.012631017714738846
Validation loss = 0.009474817663431168
Validation loss = 0.008812031708657742
Validation loss = 0.00864771381020546
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009392035193741322
Validation loss = 0.009779593907296658
Validation loss = 0.009385642595589161
Validation loss = 0.009312222711741924
Validation loss = 0.012737229466438293
Validation loss = 0.011632328853011131
Validation loss = 0.010432440787553787
Validation loss = 0.010759725235402584
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009806978516280651
Validation loss = 0.009448791854083538
Validation loss = 0.010017422959208488
Validation loss = 0.009156450629234314
Validation loss = 0.009462817572057247
Validation loss = 0.013400855474174023
Validation loss = 0.010706210508942604
Validation loss = 0.0087956003844738
Validation loss = 0.011517052538692951
Validation loss = 0.008869907818734646
Validation loss = 0.008725566789507866
Validation loss = 0.009286434389650822
Validation loss = 0.010576505213975906
Validation loss = 0.010382839478552341
Validation loss = 0.00940545555204153
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010129397734999657
Validation loss = 0.010248693637549877
Validation loss = 0.009085827507078648
Validation loss = 0.009366580285131931
Validation loss = 0.009592656046152115
Validation loss = 0.00940414797514677
Validation loss = 0.009981217794120312
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014143911190330982
Validation loss = 0.00898478552699089
Validation loss = 0.012723027728497982
Validation loss = 0.008833923377096653
Validation loss = 0.00917252991348505
Validation loss = 0.011199775151908398
Validation loss = 0.009473429061472416
Validation loss = 0.010175232775509357
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0009   |
| Iteration     | 61        |
| MaximumReturn | -0.000719 |
| MinimumReturn | -0.00108  |
| TotalSamples  | 104958    |
-----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008420387282967567
Validation loss = 0.009216412901878357
Validation loss = 0.008828005753457546
Validation loss = 0.00824645720422268
Validation loss = 0.010612642392516136
Validation loss = 0.00828993134200573
Validation loss = 0.008371256291866302
Validation loss = 0.012057214044034481
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009133860468864441
Validation loss = 0.010212191380560398
Validation loss = 0.009413626976311207
Validation loss = 0.012274787761271
Validation loss = 0.009701469913125038
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008916525170207024
Validation loss = 0.011546551249921322
Validation loss = 0.009160560555756092
Validation loss = 0.009210928343236446
Validation loss = 0.010828416794538498
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009948795661330223
Validation loss = 0.009654550813138485
Validation loss = 0.009304353035986423
Validation loss = 0.010504741221666336
Validation loss = 0.009611652232706547
Validation loss = 0.009276418015360832
Validation loss = 0.010293875820934772
Validation loss = 0.010733218863606453
Validation loss = 0.009244672022759914
Validation loss = 0.008742829784750938
Validation loss = 0.011197918094694614
Validation loss = 0.010080286301672459
Validation loss = 0.01075480505824089
Validation loss = 0.009593513794243336
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01064627431333065
Validation loss = 0.009987915866076946
Validation loss = 0.010142158716917038
Validation loss = 0.01324970368295908
Validation loss = 0.011135105043649673
Validation loss = 0.01027909480035305
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000935 |
| Iteration     | 62        |
| MaximumReturn | -0.00067  |
| MinimumReturn | -0.00145  |
| TotalSamples  | 106624    |
-----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012622515670955181
Validation loss = 0.007873605005443096
Validation loss = 0.008802191354334354
Validation loss = 0.008066474460065365
Validation loss = 0.009713596664369106
Validation loss = 0.008848953992128372
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00871953833848238
Validation loss = 0.009626628831028938
Validation loss = 0.010948793031275272
Validation loss = 0.008705275133252144
Validation loss = 0.009218286722898483
Validation loss = 0.009147181175649166
Validation loss = 0.010258248075842857
Validation loss = 0.008965371176600456
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012349267490208149
Validation loss = 0.011057822033762932
Validation loss = 0.009587932378053665
Validation loss = 0.009174751117825508
Validation loss = 0.009565440006554127
Validation loss = 0.011966010555624962
Validation loss = 0.008947997353971004
Validation loss = 0.009073943831026554
Validation loss = 0.010623697191476822
Validation loss = 0.013811015523970127
Validation loss = 0.008729318156838417
Validation loss = 0.009604999795556068
Validation loss = 0.009235494770109653
Validation loss = 0.00924290157854557
Validation loss = 0.009995652362704277
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01103181205689907
Validation loss = 0.009264344349503517
Validation loss = 0.00912376120686531
Validation loss = 0.009396958164870739
Validation loss = 0.013766584917902946
Validation loss = 0.011464884504675865
Validation loss = 0.012160658836364746
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009958209469914436
Validation loss = 0.009376730769872665
Validation loss = 0.009351329877972603
Validation loss = 0.010001660324633121
Validation loss = 0.009973917156457901
Validation loss = 0.009883012622594833
Validation loss = 0.009915976785123348
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000894 |
| Iteration     | 63        |
| MaximumReturn | -0.000678 |
| MinimumReturn | -0.00118  |
| TotalSamples  | 108290    |
-----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008375565521419048
Validation loss = 0.01030009239912033
Validation loss = 0.010276827961206436
Validation loss = 0.008659113198518753
Validation loss = 0.008804290555417538
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009425953961908817
Validation loss = 0.008941859006881714
Validation loss = 0.00982931349426508
Validation loss = 0.009567097760736942
Validation loss = 0.009123475290834904
Validation loss = 0.010112512856721878
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009799106977880001
Validation loss = 0.010180454701185226
Validation loss = 0.00954414252191782
Validation loss = 0.009564340114593506
Validation loss = 0.01107002329081297
Validation loss = 0.009582046419382095
Validation loss = 0.010275017470121384
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009341900236904621
Validation loss = 0.00848426390439272
Validation loss = 0.009331145323812962
Validation loss = 0.008849485777318478
Validation loss = 0.009072980843484402
Validation loss = 0.0085061090067029
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010060007683932781
Validation loss = 0.009411264210939407
Validation loss = 0.009260625578463078
Validation loss = 0.010257287882268429
Validation loss = 0.01208341121673584
Validation loss = 0.01075671799480915
Validation loss = 0.010281501337885857
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00106  |
| Iteration     | 64        |
| MaximumReturn | -0.000803 |
| MinimumReturn | -0.00144  |
| TotalSamples  | 109956    |
-----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009037356823682785
Validation loss = 0.00946469884365797
Validation loss = 0.008790479972958565
Validation loss = 0.009902909398078918
Validation loss = 0.008836067281663418
Validation loss = 0.010796453803777695
Validation loss = 0.008653990924358368
Validation loss = 0.010142921470105648
Validation loss = 0.008026499301195145
Validation loss = 0.00824169721454382
Validation loss = 0.01002194732427597
Validation loss = 0.008619633503258228
Validation loss = 0.008300544694066048
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010180175304412842
Validation loss = 0.010130515322089195
Validation loss = 0.010476676747202873
Validation loss = 0.0088191544637084
Validation loss = 0.009947233833372593
Validation loss = 0.009873258881270885
Validation loss = 0.009087382815778255
Validation loss = 0.010080555453896523
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009171041660010815
Validation loss = 0.009662452153861523
Validation loss = 0.009657483547925949
Validation loss = 0.009668921120464802
Validation loss = 0.00911013688892126
Validation loss = 0.009817483834922314
Validation loss = 0.010063905268907547
Validation loss = 0.009363332763314247
Validation loss = 0.011870147660374641
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009561006911098957
Validation loss = 0.008882187306880951
Validation loss = 0.009318605996668339
Validation loss = 0.009165693074464798
Validation loss = 0.00993614923208952
Validation loss = 0.00903888139873743
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009443423710763454
Validation loss = 0.009766760282218456
Validation loss = 0.010842064395546913
Validation loss = 0.010593103244900703
Validation loss = 0.013215355575084686
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000966 |
| Iteration     | 65        |
| MaximumReturn | -0.000708 |
| MinimumReturn | -0.00141  |
| TotalSamples  | 111622    |
-----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008049960248172283
Validation loss = 0.009548278525471687
Validation loss = 0.010726476088166237
Validation loss = 0.008847312070429325
Validation loss = 0.008473042398691177
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008597995154559612
Validation loss = 0.009133736602962017
Validation loss = 0.01060451939702034
Validation loss = 0.00851541105657816
Validation loss = 0.00974675826728344
Validation loss = 0.008885352872312069
Validation loss = 0.011919291689991951
Validation loss = 0.009037276729941368
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00986042432487011
Validation loss = 0.010002976283431053
Validation loss = 0.00976872444152832
Validation loss = 0.009858841076493263
Validation loss = 0.01130068302154541
Validation loss = 0.009005429223179817
Validation loss = 0.009777572937309742
Validation loss = 0.00987101811915636
Validation loss = 0.009256342425942421
Validation loss = 0.009362914599478245
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010122677311301231
Validation loss = 0.010371406562626362
Validation loss = 0.00909475889056921
Validation loss = 0.014171975664794445
Validation loss = 0.009465780109167099
Validation loss = 0.009035746566951275
Validation loss = 0.008599278517067432
Validation loss = 0.009901351295411587
Validation loss = 0.008895078673958778
Validation loss = 0.009110062383115292
Validation loss = 0.009879187680780888
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009583025239408016
Validation loss = 0.011529586277902126
Validation loss = 0.011763296090066433
Validation loss = 0.010747618041932583
Validation loss = 0.010846861638128757
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000844 |
| Iteration     | 66        |
| MaximumReturn | -0.000644 |
| MinimumReturn | -0.00144  |
| TotalSamples  | 113288    |
-----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00814823992550373
Validation loss = 0.010615174658596516
Validation loss = 0.0077042183838784695
Validation loss = 0.008979858830571175
Validation loss = 0.008343664929270744
Validation loss = 0.008682847954332829
Validation loss = 0.008904490619897842
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009271408431231976
Validation loss = 0.010774583555758
Validation loss = 0.00933280773460865
Validation loss = 0.010057077743113041
Validation loss = 0.014617680571973324
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009450200013816357
Validation loss = 0.010027626529335976
Validation loss = 0.012160973623394966
Validation loss = 0.00916744489222765
Validation loss = 0.00951507966965437
Validation loss = 0.009418816305696964
Validation loss = 0.00878114439547062
Validation loss = 0.011224128305912018
Validation loss = 0.01233305037021637
Validation loss = 0.009055412374436855
Validation loss = 0.010429292917251587
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0119748180732131
Validation loss = 0.010138755664229393
Validation loss = 0.008641241118311882
Validation loss = 0.008686868473887444
Validation loss = 0.012509587220847607
Validation loss = 0.009241710416972637
Validation loss = 0.009300433099269867
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010353238321840763
Validation loss = 0.010998198762536049
Validation loss = 0.00906999409198761
Validation loss = 0.009501983411610126
Validation loss = 0.010579569265246391
Validation loss = 0.010755201801657677
Validation loss = 0.00948669295758009
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00243 |
| Iteration     | 67       |
| MaximumReturn | -0.00163 |
| MinimumReturn | -0.00332 |
| TotalSamples  | 114954   |
----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01029976923018694
Validation loss = 0.008219965733587742
Validation loss = 0.01121823862195015
Validation loss = 0.008959873579442501
Validation loss = 0.008795907720923424
Validation loss = 0.009402045048773289
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010595706291496754
Validation loss = 0.009222977794706821
Validation loss = 0.01094676461070776
Validation loss = 0.00959372241050005
Validation loss = 0.009826947003602982
Validation loss = 0.010158882476389408
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009168325923383236
Validation loss = 0.009218323975801468
Validation loss = 0.008144669234752655
Validation loss = 0.009386277757585049
Validation loss = 0.009480073116719723
Validation loss = 0.009872982278466225
Validation loss = 0.00962742231786251
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009564618580043316
Validation loss = 0.009010867215692997
Validation loss = 0.012023038230836391
Validation loss = 0.011682880111038685
Validation loss = 0.01017227303236723
Validation loss = 0.011629939079284668
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009824219159781933
Validation loss = 0.010588135570287704
Validation loss = 0.011532963253557682
Validation loss = 0.009151292033493519
Validation loss = 0.009458106011152267
Validation loss = 0.008779323659837246
Validation loss = 0.013090125285089016
Validation loss = 0.010147858411073685
Validation loss = 0.00952987466007471
Validation loss = 0.009563256986439228
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000987 |
| Iteration     | 68        |
| MaximumReturn | -0.000669 |
| MinimumReturn | -0.00183  |
| TotalSamples  | 116620    |
-----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008754601702094078
Validation loss = 0.00905644241720438
Validation loss = 0.008548958227038383
Validation loss = 0.008404149673879147
Validation loss = 0.00854448787868023
Validation loss = 0.012517374940216541
Validation loss = 0.008438549935817719
Validation loss = 0.00949630793184042
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009649542160332203
Validation loss = 0.011755309998989105
Validation loss = 0.009508519433438778
Validation loss = 0.009888389147818089
Validation loss = 0.008759858086705208
Validation loss = 0.00963447242975235
Validation loss = 0.013894564472138882
Validation loss = 0.009744050912559032
Validation loss = 0.009919891133904457
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009249480441212654
Validation loss = 0.010007699951529503
Validation loss = 0.009669660590589046
Validation loss = 0.009988783858716488
Validation loss = 0.01149432547390461
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010211904533207417
Validation loss = 0.009531387127935886
Validation loss = 0.011288519948720932
Validation loss = 0.009810764342546463
Validation loss = 0.00913984701037407
Validation loss = 0.010249493643641472
Validation loss = 0.010131483897566795
Validation loss = 0.009042276069521904
Validation loss = 0.010460041463375092
Validation loss = 0.013742546550929546
Validation loss = 0.008792908862233162
Validation loss = 0.008951766416430473
Validation loss = 0.014917757362127304
Validation loss = 0.008983164094388485
Validation loss = 0.008859600871801376
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010341618210077286
Validation loss = 0.010728198103606701
Validation loss = 0.008692746981978416
Validation loss = 0.008604064583778381
Validation loss = 0.008723743259906769
Validation loss = 0.012849515303969383
Validation loss = 0.009398487396538258
Validation loss = 0.009220882318913937
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00101  |
| Iteration     | 69        |
| MaximumReturn | -0.000731 |
| MinimumReturn | -0.00126  |
| TotalSamples  | 118286    |
-----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008683259598910809
Validation loss = 0.008516288362443447
Validation loss = 0.009543221443891525
Validation loss = 0.008638223633170128
Validation loss = 0.008271106518805027
Validation loss = 0.007698463276028633
Validation loss = 0.009011301212012768
Validation loss = 0.008663056418299675
Validation loss = 0.008593621663749218
Validation loss = 0.009806837886571884
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0116608627140522
Validation loss = 0.009185221046209335
Validation loss = 0.00910196639597416
Validation loss = 0.014141983352601528
Validation loss = 0.010903017595410347
Validation loss = 0.011486335657536983
Validation loss = 0.008645523339509964
Validation loss = 0.008801756426692009
Validation loss = 0.010879024863243103
Validation loss = 0.010978002101182938
Validation loss = 0.008954168297350407
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009983370080590248
Validation loss = 0.01001990307122469
Validation loss = 0.0095081040635705
Validation loss = 0.014823664911091328
Validation loss = 0.01002458855509758
Validation loss = 0.010015690699219704
Validation loss = 0.010221956297755241
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010156615637242794
Validation loss = 0.011150343343615532
Validation loss = 0.010067487135529518
Validation loss = 0.009379393421113491
Validation loss = 0.009867028333246708
Validation loss = 0.009880702942609787
Validation loss = 0.009805667214095592
Validation loss = 0.01110650785267353
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009132655337452888
Validation loss = 0.00966168474406004
Validation loss = 0.01139877364039421
Validation loss = 0.0098383454605937
Validation loss = 0.010012295097112656
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00111  |
| Iteration     | 70        |
| MaximumReturn | -0.000836 |
| MinimumReturn | -0.00144  |
| TotalSamples  | 119952    |
-----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008716619573533535
Validation loss = 0.00822835136204958
Validation loss = 0.009739558212459087
Validation loss = 0.011998395435512066
Validation loss = 0.008573859930038452
Validation loss = 0.008649056777358055
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009138881228864193
Validation loss = 0.011644002981483936
Validation loss = 0.0106961689889431
Validation loss = 0.00837442371994257
Validation loss = 0.009739356115460396
Validation loss = 0.009419340640306473
Validation loss = 0.010410278104245663
Validation loss = 0.01025302056223154
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009387953206896782
Validation loss = 0.009733933955430984
Validation loss = 0.009904067032039165
Validation loss = 0.010830359533429146
Validation loss = 0.010896132327616215
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01126634981483221
Validation loss = 0.009869521483778954
Validation loss = 0.009004230611026287
Validation loss = 0.009433175437152386
Validation loss = 0.011074685491621494
Validation loss = 0.009913873858749866
Validation loss = 0.008925816975533962
Validation loss = 0.009998399764299393
Validation loss = 0.009481356479227543
Validation loss = 0.009609092026948929
Validation loss = 0.011879698373377323
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01173351053148508
Validation loss = 0.010812978260219097
Validation loss = 0.009464177303016186
Validation loss = 0.012467419728636742
Validation loss = 0.010229731909930706
Validation loss = 0.009419490583240986
Validation loss = 0.009521570056676865
Validation loss = 0.00990846287459135
Validation loss = 0.01491460483521223
Validation loss = 0.009587266482412815
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00111  |
| Iteration     | 71        |
| MaximumReturn | -0.000752 |
| MinimumReturn | -0.00164  |
| TotalSamples  | 121618    |
-----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008481652475893497
Validation loss = 0.008992383256554604
Validation loss = 0.01063840277493
Validation loss = 0.008664044551551342
Validation loss = 0.013044890947639942
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011499732732772827
Validation loss = 0.009947926737368107
Validation loss = 0.00998774729669094
Validation loss = 0.00952220056205988
Validation loss = 0.010951122269034386
Validation loss = 0.009788763709366322
Validation loss = 0.010273681953549385
Validation loss = 0.00944624375551939
Validation loss = 0.00910888984799385
Validation loss = 0.01360869500786066
Validation loss = 0.009192886762320995
Validation loss = 0.009319002740085125
Validation loss = 0.010995534248650074
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008461195044219494
Validation loss = 0.008841162547469139
Validation loss = 0.008992282673716545
Validation loss = 0.009376751258969307
Validation loss = 0.01005702093243599
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008721415884792805
Validation loss = 0.011146848089993
Validation loss = 0.00926312804222107
Validation loss = 0.009338079020380974
Validation loss = 0.01023323554545641
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009122575633227825
Validation loss = 0.01229129545390606
Validation loss = 0.010098220780491829
Validation loss = 0.010168040171265602
Validation loss = 0.012260186485946178
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00188 |
| Iteration     | 72       |
| MaximumReturn | -0.00136 |
| MinimumReturn | -0.00228 |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00987904705107212
Validation loss = 0.007715161889791489
Validation loss = 0.008598778396844864
Validation loss = 0.008073470555245876
Validation loss = 0.008402352221310139
Validation loss = 0.008232342079281807
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011397060938179493
Validation loss = 0.012173878960311413
Validation loss = 0.009612000547349453
Validation loss = 0.009904920123517513
Validation loss = 0.00905842799693346
Validation loss = 0.009719863533973694
Validation loss = 0.00918448343873024
Validation loss = 0.009181917645037174
Validation loss = 0.009052651934325695
Validation loss = 0.009341903030872345
Validation loss = 0.00956869125366211
Validation loss = 0.008788129314780235
Validation loss = 0.011628461070358753
Validation loss = 0.009527119807898998
Validation loss = 0.01039606612175703
Validation loss = 0.009776407852768898
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009623982943594456
Validation loss = 0.009486768394708633
Validation loss = 0.011797244660556316
Validation loss = 0.009558645077049732
Validation loss = 0.012693362310528755
Validation loss = 0.009342728182673454
Validation loss = 0.00871965754777193
Validation loss = 0.011300135403871536
Validation loss = 0.009413368999958038
Validation loss = 0.008266434073448181
Validation loss = 0.00910502951592207
Validation loss = 0.010054478421807289
Validation loss = 0.00850862916558981
Validation loss = 0.0183399748057127
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009955879300832748
Validation loss = 0.00999581441283226
Validation loss = 0.00986782182008028
Validation loss = 0.009301397018134594
Validation loss = 0.009276727214455605
Validation loss = 0.008729432709515095
Validation loss = 0.009200339205563068
Validation loss = 0.014411802403628826
Validation loss = 0.009765513241291046
Validation loss = 0.013194707222282887
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011584363877773285
Validation loss = 0.012301761656999588
Validation loss = 0.011975230649113655
Validation loss = 0.008995642885565758
Validation loss = 0.016302185133099556
Validation loss = 0.010262573137879372
Validation loss = 0.009794965386390686
Validation loss = 0.010206910781562328
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000964 |
| Iteration     | 73        |
| MaximumReturn | -0.000658 |
| MinimumReturn | -0.00132  |
| TotalSamples  | 124950    |
-----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00850269477814436
Validation loss = 0.0077214776538312435
Validation loss = 0.010718504898250103
Validation loss = 0.009926434606313705
Validation loss = 0.009149559773504734
Validation loss = 0.008460021577775478
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00896732322871685
Validation loss = 0.011542978696525097
Validation loss = 0.009943024255335331
Validation loss = 0.0089673837646842
Validation loss = 0.013522951863706112
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014395374804735184
Validation loss = 0.012643826194107533
Validation loss = 0.011008298955857754
Validation loss = 0.011255783028900623
Validation loss = 0.011386827565729618
Validation loss = 0.009111234918236732
Validation loss = 0.009086879901587963
Validation loss = 0.011207478120923042
Validation loss = 0.00839274749159813
Validation loss = 0.009077274240553379
Validation loss = 0.009646592661738396
Validation loss = 0.00917397066950798
Validation loss = 0.008593834936618805
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009648595936596394
Validation loss = 0.010545114055275917
Validation loss = 0.008638187311589718
Validation loss = 0.009380473755300045
Validation loss = 0.009572506882250309
Validation loss = 0.009239327162504196
Validation loss = 0.009377541951835155
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010362720116972923
Validation loss = 0.01229995395988226
Validation loss = 0.009377469308674335
Validation loss = 0.009911864064633846
Validation loss = 0.011222281493246555
Validation loss = 0.01604972593486309
Validation loss = 0.009529413655400276
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00101  |
| Iteration     | 74        |
| MaximumReturn | -0.000681 |
| MinimumReturn | -0.00192  |
| TotalSamples  | 126616    |
-----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00891911331564188
Validation loss = 0.00870248768478632
Validation loss = 0.010443017818033695
Validation loss = 0.00849942397326231
Validation loss = 0.008527776226401329
Validation loss = 0.008816760964691639
Validation loss = 0.010280298069119453
Validation loss = 0.007887621410191059
Validation loss = 0.010604215785861015
Validation loss = 0.008742518723011017
Validation loss = 0.007230668794363737
Validation loss = 0.010740463621914387
Validation loss = 0.0075131990015506744
Validation loss = 0.007972859777510166
Validation loss = 0.009174029342830181
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011000932194292545
Validation loss = 0.010055079124867916
Validation loss = 0.009736009873449802
Validation loss = 0.010815618559718132
Validation loss = 0.011358492076396942
Validation loss = 0.010081762447953224
Validation loss = 0.012808469124138355
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008395821787416935
Validation loss = 0.014153414405882359
Validation loss = 0.00915790069848299
Validation loss = 0.01042796578258276
Validation loss = 0.010255108587443829
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008876747451722622
Validation loss = 0.009936073794960976
Validation loss = 0.013524975627660751
Validation loss = 0.009221233427524567
Validation loss = 0.008733668364584446
Validation loss = 0.008809136226773262
Validation loss = 0.00994969718158245
Validation loss = 0.009714988060295582
Validation loss = 0.010499552823603153
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00897259172052145
Validation loss = 0.01029116753488779
Validation loss = 0.009648321196436882
Validation loss = 0.011455899104475975
Validation loss = 0.010132568888366222
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00202 |
| Iteration     | 75       |
| MaximumReturn | -0.0015  |
| MinimumReturn | -0.0026  |
| TotalSamples  | 128282   |
----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012331546284258366
Validation loss = 0.008308366872370243
Validation loss = 0.007924024946987629
Validation loss = 0.007297740317881107
Validation loss = 0.007737074978649616
Validation loss = 0.009763548150658607
Validation loss = 0.007977318949997425
Validation loss = 0.008166404440999031
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00904141366481781
Validation loss = 0.009728201664984226
Validation loss = 0.009112883359193802
Validation loss = 0.011169523932039738
Validation loss = 0.010084803216159344
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008219036273658276
Validation loss = 0.008297794498503208
Validation loss = 0.008181932382285595
Validation loss = 0.009031889960169792
Validation loss = 0.02577315643429756
Validation loss = 0.009543905034661293
Validation loss = 0.008944480679929256
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00989835150539875
Validation loss = 0.008457385934889317
Validation loss = 0.008541235700249672
Validation loss = 0.008517502807080746
Validation loss = 0.008484133519232273
Validation loss = 0.00954157579690218
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00938988197594881
Validation loss = 0.010082113556563854
Validation loss = 0.010187416337430477
Validation loss = 0.00949489139020443
Validation loss = 0.008786303922533989
Validation loss = 0.010381018742918968
Validation loss = 0.009698332287371159
Validation loss = 0.009158993139863014
Validation loss = 0.009463656693696976
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000985 |
| Iteration     | 76        |
| MaximumReturn | -0.000594 |
| MinimumReturn | -0.00153  |
| TotalSamples  | 129948    |
-----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009469514712691307
Validation loss = 0.007785314694046974
Validation loss = 0.007688961457461119
Validation loss = 0.007774125784635544
Validation loss = 0.007537269964814186
Validation loss = 0.007591412402689457
Validation loss = 0.008544492535293102
Validation loss = 0.008518185466527939
Validation loss = 0.008457924239337444
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009047480300068855
Validation loss = 0.009047522209584713
Validation loss = 0.011255600489675999
Validation loss = 0.011697113513946533
Validation loss = 0.010040468536317348
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009030337445437908
Validation loss = 0.008343675173819065
Validation loss = 0.007990050129592419
Validation loss = 0.010309863835573196
Validation loss = 0.00958380289375782
Validation loss = 0.008826585486531258
Validation loss = 0.009205834940075874
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010008933022618294
Validation loss = 0.008327157236635685
Validation loss = 0.00856927689164877
Validation loss = 0.008326240815222263
Validation loss = 0.009997435845434666
Validation loss = 0.010473014786839485
Validation loss = 0.00904315896332264
Validation loss = 0.010029517114162445
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010399849154055119
Validation loss = 0.010874488390982151
Validation loss = 0.008275547064840794
Validation loss = 0.010954812169075012
Validation loss = 0.008539511822164059
Validation loss = 0.008990434929728508
Validation loss = 0.00924659799784422
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00179 |
| Iteration     | 77       |
| MaximumReturn | -0.00123 |
| MinimumReturn | -0.00236 |
| TotalSamples  | 131614   |
----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00874759815633297
Validation loss = 0.007998782210052013
Validation loss = 0.007628306746482849
Validation loss = 0.009077299386262894
Validation loss = 0.008357184007763863
Validation loss = 0.008296767249703407
Validation loss = 0.008605029433965683
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00915435515344143
Validation loss = 0.01017209142446518
Validation loss = 0.010052370838820934
Validation loss = 0.009613613598048687
Validation loss = 0.00989935640245676
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009610756300389767
Validation loss = 0.009161518886685371
Validation loss = 0.008851563557982445
Validation loss = 0.011868208646774292
Validation loss = 0.008169383741915226
Validation loss = 0.00835585780441761
Validation loss = 0.00931528676301241
Validation loss = 0.012290445156395435
Validation loss = 0.007867589592933655
Validation loss = 0.007944978773593903
Validation loss = 0.008088142611086369
Validation loss = 0.008969264104962349
Validation loss = 0.008035602979362011
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00928544718772173
Validation loss = 0.009327550418674946
Validation loss = 0.008808043785393238
Validation loss = 0.010725092142820358
Validation loss = 0.008922828361392021
Validation loss = 0.008679213933646679
Validation loss = 0.013955939561128616
Validation loss = 0.011529956012964249
Validation loss = 0.00866374745965004
Validation loss = 0.009471796452999115
Validation loss = 0.00908487755805254
Validation loss = 0.009259491227567196
Validation loss = 0.009267788380384445
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00960469525307417
Validation loss = 0.010864178650081158
Validation loss = 0.01031859964132309
Validation loss = 0.009573613293468952
Validation loss = 0.009689689613878727
Validation loss = 0.010518563911318779
Validation loss = 0.009945712052285671
Validation loss = 0.009366555139422417
Validation loss = 0.009896392934024334
Validation loss = 0.01166477520018816
Validation loss = 0.009318957105278969
Validation loss = 0.009936245158314705
Validation loss = 0.010213308036327362
Validation loss = 0.008863093331456184
Validation loss = 0.010292868129909039
Validation loss = 0.008567418903112411
Validation loss = 0.012048294767737389
Validation loss = 0.010226365178823471
Validation loss = 0.009766384959220886
Validation loss = 0.014705372042953968
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00683 |
| Iteration     | 78       |
| MaximumReturn | -0.00397 |
| MinimumReturn | -0.00942 |
| TotalSamples  | 133280   |
----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00746862031519413
Validation loss = 0.008236856199800968
Validation loss = 0.008525743149220943
Validation loss = 0.007575655821710825
Validation loss = 0.008399415761232376
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008855571039021015
Validation loss = 0.008961477316915989
Validation loss = 0.010896840132772923
Validation loss = 0.010209685191512108
Validation loss = 0.010706677101552486
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009642256423830986
Validation loss = 0.00857541337609291
Validation loss = 0.01091013289988041
Validation loss = 0.010058273561298847
Validation loss = 0.009254863485693932
Validation loss = 0.008784757927060127
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00957220233976841
Validation loss = 0.010811246000230312
Validation loss = 0.011207946576178074
Validation loss = 0.008588308468461037
Validation loss = 0.008801594376564026
Validation loss = 0.010936382226645947
Validation loss = 0.008891699835658073
Validation loss = 0.008864425122737885
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017530743032693863
Validation loss = 0.00996963307261467
Validation loss = 0.009473328478634357
Validation loss = 0.01161640789359808
Validation loss = 0.008414937183260918
Validation loss = 0.008501199074089527
Validation loss = 0.00894965697079897
Validation loss = 0.008364253677427769
Validation loss = 0.010627083480358124
Validation loss = 0.00980474054813385
Validation loss = 0.009284034371376038
Validation loss = 0.008695528842508793
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0012   |
| Iteration     | 79        |
| MaximumReturn | -0.000936 |
| MinimumReturn | -0.00158  |
| TotalSamples  | 134946    |
-----------------------------
