Logging to experiments/invertedPendulum/nov1/IA01_w350e3_seed2531
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7365395426750183
Validation loss = 0.3998357057571411
Validation loss = 0.3750753402709961
Validation loss = 0.3424554467201233
Validation loss = 0.320643812417984
Validation loss = 0.3120421767234802
Validation loss = 0.2808763384819031
Validation loss = 0.28454479575157166
Validation loss = 0.2852585017681122
Validation loss = 0.2739466428756714
Validation loss = 0.2530207931995392
Validation loss = 0.25578221678733826
Validation loss = 0.24478840827941895
Validation loss = 0.2485012710094452
Validation loss = 0.2574857771396637
Validation loss = 0.24258576333522797
Validation loss = 0.23277637362480164
Validation loss = 0.23860350251197815
Validation loss = 0.22652052342891693
Validation loss = 0.23119866847991943
Validation loss = 0.21216937899589539
Validation loss = 0.21392077207565308
Validation loss = 0.23355413973331451
Validation loss = 0.21126064658164978
Validation loss = 0.21195663511753082
Validation loss = 0.20917341113090515
Validation loss = 0.21949489414691925
Validation loss = 0.20920537412166595
Validation loss = 0.21915479004383087
Validation loss = 0.20087948441505432
Validation loss = 0.19429618120193481
Validation loss = 0.20332740247249603
Validation loss = 0.19641774892807007
Validation loss = 0.20731118321418762
Validation loss = 0.21103475987911224
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7815805077552795
Validation loss = 0.3851301074028015
Validation loss = 0.37760427594184875
Validation loss = 0.34720778465270996
Validation loss = 0.31859642267227173
Validation loss = 0.3126683533191681
Validation loss = 0.28982627391815186
Validation loss = 0.2864748239517212
Validation loss = 0.2713451683521271
Validation loss = 0.2584758698940277
Validation loss = 0.25353357195854187
Validation loss = 0.2483895868062973
Validation loss = 0.2511996328830719
Validation loss = 0.24064965546131134
Validation loss = 0.25983303785324097
Validation loss = 0.2479342520236969
Validation loss = 0.2566200792789459
Validation loss = 0.23727259039878845
Validation loss = 0.26610109210014343
Validation loss = 0.2348482757806778
Validation loss = 0.24689267575740814
Validation loss = 0.25248971581459045
Validation loss = 0.2338356375694275
Validation loss = 0.22807157039642334
Validation loss = 0.2144080549478531
Validation loss = 0.25950682163238525
Validation loss = 0.24400736391544342
Validation loss = 0.2174423187971115
Validation loss = 0.22021909058094025
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.737570583820343
Validation loss = 0.41170206665992737
Validation loss = 0.3691725432872772
Validation loss = 0.3423005938529968
Validation loss = 0.3431488275527954
Validation loss = 0.3001724183559418
Validation loss = 0.2904249429702759
Validation loss = 0.26660895347595215
Validation loss = 0.2739100754261017
Validation loss = 0.25590696930885315
Validation loss = 0.25870731472969055
Validation loss = 0.2812499403953552
Validation loss = 0.23367561399936676
Validation loss = 0.24893628060817719
Validation loss = 0.2383480668067932
Validation loss = 0.25680118799209595
Validation loss = 0.2359779179096222
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7327225208282471
Validation loss = 0.43847063183784485
Validation loss = 0.392852246761322
Validation loss = 0.34491580724716187
Validation loss = 0.3503768742084503
Validation loss = 0.3200695812702179
Validation loss = 0.2948671877384186
Validation loss = 0.28323739767074585
Validation loss = 0.2702867090702057
Validation loss = 0.25728681683540344
Validation loss = 0.2544802129268646
Validation loss = 0.24606400728225708
Validation loss = 0.2523103952407837
Validation loss = 0.24977247416973114
Validation loss = 0.24464938044548035
Validation loss = 0.2820562720298767
Validation loss = 0.24328100681304932
Validation loss = 0.2390853613615036
Validation loss = 0.2564457654953003
Validation loss = 0.24740199744701385
Validation loss = 0.21731658279895782
Validation loss = 0.21794342994689941
Validation loss = 0.21139632165431976
Validation loss = 0.20683802664279938
Validation loss = 0.20383605360984802
Validation loss = 0.21562662720680237
Validation loss = 0.20317339897155762
Validation loss = 0.21590368449687958
Validation loss = 0.20023219287395477
Validation loss = 0.2022341936826706
Validation loss = 0.19648100435733795
Validation loss = 0.2057064175605774
Validation loss = 0.20676910877227783
Validation loss = 0.20489756762981415
Validation loss = 0.19343101978302002
Validation loss = 0.19299820065498352
Validation loss = 0.20741106569766998
Validation loss = 0.21321308612823486
Validation loss = 0.20063701272010803
Validation loss = 0.21459132432937622
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7506386637687683
Validation loss = 0.39411798119544983
Validation loss = 0.36141690611839294
Validation loss = 0.36195120215415955
Validation loss = 0.32801300287246704
Validation loss = 0.31624460220336914
Validation loss = 0.3062730133533478
Validation loss = 0.286953866481781
Validation loss = 0.2799718379974365
Validation loss = 0.2680526375770569
Validation loss = 0.24660935997962952
Validation loss = 0.2627968490123749
Validation loss = 0.25553280115127563
Validation loss = 0.2549138367176056
Validation loss = 0.24408598244190216
Validation loss = 0.2609744668006897
Validation loss = 0.25371354818344116
Validation loss = 0.23413358628749847
Validation loss = 0.219064861536026
Validation loss = 0.23636047542095184
Validation loss = 0.22021228075027466
Validation loss = 0.23093071579933167
Validation loss = 0.2452671378850937
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.447   |
| Iteration     | 0        |
| MaximumReturn | -0.024   |
| MinimumReturn | -7.88    |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2972879409790039
Validation loss = 0.18683397769927979
Validation loss = 0.17567585408687592
Validation loss = 0.18330956995487213
Validation loss = 0.17563673853874207
Validation loss = 0.16258113086223602
Validation loss = 0.1551659256219864
Validation loss = 0.15028116106987
Validation loss = 0.15204891562461853
Validation loss = 0.1632150560617447
Validation loss = 0.15106657147407532
Validation loss = 0.14853571355342865
Validation loss = 0.16054335236549377
Validation loss = 0.15453661978244781
Validation loss = 0.17117664217948914
Validation loss = 0.18237696588039398
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.24467656016349792
Validation loss = 0.17626774311065674
Validation loss = 0.17647749185562134
Validation loss = 0.18406885862350464
Validation loss = 0.2017601579427719
Validation loss = 0.18182158470153809
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.21998576819896698
Validation loss = 0.20418265461921692
Validation loss = 0.19716307520866394
Validation loss = 0.18034006655216217
Validation loss = 0.15335606038570404
Validation loss = 0.17239966988563538
Validation loss = 0.164761483669281
Validation loss = 0.14767715334892273
Validation loss = 0.16008685529232025
Validation loss = 0.1601778119802475
Validation loss = 0.1586315631866455
Validation loss = 0.16883273422718048
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2893281877040863
Validation loss = 0.20096121728420258
Validation loss = 0.18402689695358276
Validation loss = 0.16665327548980713
Validation loss = 0.16780340671539307
Validation loss = 0.15318654477596283
Validation loss = 0.14507471024990082
Validation loss = 0.14374883472919464
Validation loss = 0.13489879667758942
Validation loss = 0.1529296636581421
Validation loss = 0.14580658078193665
Validation loss = 0.1374192088842392
Validation loss = 0.14180628955364227
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.27943503856658936
Validation loss = 0.19412752985954285
Validation loss = 0.19004125893115997
Validation loss = 0.1684175431728363
Validation loss = 0.1623646318912506
Validation loss = 0.16854020953178406
Validation loss = 0.15801993012428284
Validation loss = 0.1742478311061859
Validation loss = 0.14474111795425415
Validation loss = 0.17338034510612488
Validation loss = 0.16738885641098022
Validation loss = 0.1725495308637619
Validation loss = 0.15906798839569092
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00531 |
| Iteration     | 1        |
| MaximumReturn | -0.00349 |
| MinimumReturn | -0.00817 |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11937233060598373
Validation loss = 0.10836830735206604
Validation loss = 0.10622335970401764
Validation loss = 0.111083023250103
Validation loss = 0.1277625560760498
Validation loss = 0.12739378213882446
Validation loss = 0.10672755539417267
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.13134032487869263
Validation loss = 0.10848550498485565
Validation loss = 0.107098788022995
Validation loss = 0.10745924711227417
Validation loss = 0.10429693013429642
Validation loss = 0.10779502242803574
Validation loss = 0.10706222057342529
Validation loss = 0.10863859951496124
Validation loss = 0.12811799347400665
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11732975393533707
Validation loss = 0.10337521880865097
Validation loss = 0.09798649698495865
Validation loss = 0.10358434915542603
Validation loss = 0.10401387512683868
Validation loss = 0.10161542147397995
Validation loss = 0.08999206125736237
Validation loss = 0.10698303580284119
Validation loss = 0.10202403366565704
Validation loss = 0.10592500120401382
Validation loss = 0.105926014482975
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1122501865029335
Validation loss = 0.09932662546634674
Validation loss = 0.09255146980285645
Validation loss = 0.09741849452257156
Validation loss = 0.09993041306734085
Validation loss = 0.10347947478294373
Validation loss = 0.10179182887077332
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.13793785870075226
Validation loss = 0.09886879473924637
Validation loss = 0.1026107668876648
Validation loss = 0.10680282860994339
Validation loss = 0.10340014100074768
Validation loss = 0.10721860080957413
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.025974025974025976
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.038461538461538464
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.11392405063291139
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.125
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12345679012345678
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.14634146341463414
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1566265060240964
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16666666666666666
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16470588235294117
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1744186046511628
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.19540229885057472
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.22727272727272727
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2247191011235955
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2222222222222222
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23076923076923078
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.25
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.26881720430107525
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26595744680851063
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.29473684210526313
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.3333333333333333
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.36082474226804123
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.3877551020408163
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.40404040404040403
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.42
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.92    |
| Iteration     | 2        |
| MaximumReturn | -0.0369  |
| MinimumReturn | -20.4    |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08422625064849854
Validation loss = 0.06283936649560928
Validation loss = 0.06622131913900375
Validation loss = 0.06250958889722824
Validation loss = 0.06185314431786537
Validation loss = 0.06318516284227371
Validation loss = 0.0643116757273674
Validation loss = 0.05945688858628273
Validation loss = 0.06332595646381378
Validation loss = 0.06694477796554565
Validation loss = 0.062265198677778244
Validation loss = 0.06296388804912567
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07876495271921158
Validation loss = 0.0646682009100914
Validation loss = 0.06773833185434341
Validation loss = 0.06405606120824814
Validation loss = 0.06847245991230011
Validation loss = 0.0606970377266407
Validation loss = 0.064092256128788
Validation loss = 0.059577327221632004
Validation loss = 0.06234825775027275
Validation loss = 0.06615861505270004
Validation loss = 0.061765238642692566
Validation loss = 0.06882189959287643
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0796624943614006
Validation loss = 0.0649624839425087
Validation loss = 0.0651923418045044
Validation loss = 0.062001343816518784
Validation loss = 0.06289101392030716
Validation loss = 0.06075892969965935
Validation loss = 0.06138097122311592
Validation loss = 0.06993456929922104
Validation loss = 0.05524822697043419
Validation loss = 0.057368356734514236
Validation loss = 0.059584394097328186
Validation loss = 0.058896247297525406
Validation loss = 0.058566246181726456
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08457266539335251
Validation loss = 0.058015406131744385
Validation loss = 0.0599956214427948
Validation loss = 0.06094186007976532
Validation loss = 0.058199286460876465
Validation loss = 0.05801277235150337
Validation loss = 0.06641610711812973
Validation loss = 0.06299307197332382
Validation loss = 0.059703513979911804
Validation loss = 0.05855773016810417
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0821773037314415
Validation loss = 0.06508263200521469
Validation loss = 0.06849224120378494
Validation loss = 0.06292308121919632
Validation loss = 0.06551840901374817
Validation loss = 0.06331069767475128
Validation loss = 0.06485439091920853
Validation loss = 0.05762837454676628
Validation loss = 0.066758893430233
Validation loss = 0.06599310785531998
Validation loss = 0.06580591201782227
Validation loss = 0.06359397619962692
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4158415841584158
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4117647058823529
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4077669902912621
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.40384615384615385
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.39622641509433965
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3925233644859813
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3888888888888889
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3853211009174312
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.38181818181818183
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3783783783783784
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.375
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.37168141592920356
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3684210526315789
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3652173913043478
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3620689655172414
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.358974358974359
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3559322033898305
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.35294117647058826
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.35
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.34710743801652894
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3442622950819672
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.34146341463414637
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3387096774193548
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.336
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0364  |
| Iteration     | 3        |
| MaximumReturn | -0.0232  |
| MinimumReturn | -0.0606  |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05823587253689766
Validation loss = 0.0408003106713295
Validation loss = 0.050242796540260315
Validation loss = 0.04661325737833977
Validation loss = 0.0434325747191906
Validation loss = 0.04792123660445213
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06474737077951431
Validation loss = 0.0486571379005909
Validation loss = 0.04659273102879524
Validation loss = 0.04741224646568298
Validation loss = 0.04943181946873665
Validation loss = 0.0477384515106678
Validation loss = 0.048118509352207184
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05896279960870743
Validation loss = 0.048404961824417114
Validation loss = 0.047298163175582886
Validation loss = 0.04234633594751358
Validation loss = 0.0429907888174057
Validation loss = 0.04412372410297394
Validation loss = 0.05015426501631737
Validation loss = 0.04705863073468208
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05663255602121353
Validation loss = 0.047709912061691284
Validation loss = 0.04248254746198654
Validation loss = 0.05095899850130081
Validation loss = 0.04926949366927147
Validation loss = 0.04346867650747299
Validation loss = 0.05017440393567085
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05968796834349632
Validation loss = 0.04714097082614899
Validation loss = 0.04468081146478653
Validation loss = 0.0469120554625988
Validation loss = 0.04900579899549484
Validation loss = 0.048048608005046844
Validation loss = 0.04720707982778549
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3333333333333333
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.33070866141732286
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.328125
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32558139534883723
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3230769230769231
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32061068702290074
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3181818181818182
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3157894736842105
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.31343283582089554
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3111111111111111
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3088235294117647
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30656934306569344
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30434782608695654
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.302158273381295
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2978723404255319
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29577464788732394
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2937062937062937
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2916666666666667
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2896551724137931
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2876712328767123
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2857142857142857
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.28378378378378377
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.28187919463087246
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.28
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00302 |
| Iteration     | 4        |
| MaximumReturn | -0.00198 |
| MinimumReturn | -0.00447 |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04534528777003288
Validation loss = 0.04185227304697037
Validation loss = 0.04835646227002144
Validation loss = 0.0510111078619957
Validation loss = 0.0446169376373291
Validation loss = 0.04459026828408241
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05275524780154228
Validation loss = 0.04413431137800217
Validation loss = 0.04604808986186981
Validation loss = 0.0449611060321331
Validation loss = 0.040614888072013855
Validation loss = 0.04135403782129288
Validation loss = 0.048013459891080856
Validation loss = 0.045862533152103424
Validation loss = 0.045322906225919724
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05885520577430725
Validation loss = 0.04444589465856552
Validation loss = 0.04666362330317497
Validation loss = 0.040035344660282135
Validation loss = 0.0423310287296772
Validation loss = 0.04931459203362465
Validation loss = 0.040956366807222366
Validation loss = 0.04619218036532402
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05290213227272034
Validation loss = 0.05994716286659241
Validation loss = 0.04203365743160248
Validation loss = 0.04159878194332123
Validation loss = 0.03826560080051422
Validation loss = 0.04014415293931961
Validation loss = 0.03933519497513771
Validation loss = 0.03981851413846016
Validation loss = 0.04155415669083595
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05233592912554741
Validation loss = 0.04391006752848625
Validation loss = 0.04158485308289528
Validation loss = 0.0386577844619751
Validation loss = 0.04752465337514877
Validation loss = 0.04864167422056198
Validation loss = 0.04288051277399063
Validation loss = 0.04216640442609787
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2781456953642384
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27631578947368424
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27450980392156865
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2727272727272727
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2709677419354839
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2692307692307692
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.267515923566879
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26582278481012656
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2641509433962264
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2625
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2608695652173913
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25925925925925924
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25766871165644173
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25609756097560976
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2545454545454545
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25301204819277107
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25149700598802394
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2485207100591716
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24705882352941178
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24561403508771928
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2441860465116279
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24277456647398843
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2413793103448276
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0223  |
| Iteration     | 5        |
| MaximumReturn | -0.0138  |
| MinimumReturn | -0.0334  |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.047662533819675446
Validation loss = 0.0389912910759449
Validation loss = 0.03624696657061577
Validation loss = 0.038305290043354034
Validation loss = 0.05181628465652466
Validation loss = 0.043883856385946274
Validation loss = 0.03937835991382599
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.048070888966321945
Validation loss = 0.04170680418610573
Validation loss = 0.044501882046461105
Validation loss = 0.04149017110466957
Validation loss = 0.044465571641922
Validation loss = 0.04172152280807495
Validation loss = 0.04468651860952377
Validation loss = 0.03912520408630371
Validation loss = 0.04661023989319801
Validation loss = 0.04182933643460274
Validation loss = 0.04047345742583275
Validation loss = 0.0391605868935585
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04993952065706253
Validation loss = 0.04327596351504326
Validation loss = 0.04055856168270111
Validation loss = 0.04081674665212631
Validation loss = 0.041858773678541183
Validation loss = 0.04318344220519066
Validation loss = 0.03716673702001572
Validation loss = 0.041030414402484894
Validation loss = 0.05005861073732376
Validation loss = 0.04946183040738106
Validation loss = 0.04059499129652977
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04215668514370918
Validation loss = 0.048814523965120316
Validation loss = 0.04104391857981682
Validation loss = 0.04086928069591522
Validation loss = 0.04033183306455612
Validation loss = 0.04003622755408287
Validation loss = 0.040740638971328735
Validation loss = 0.03829182684421539
Validation loss = 0.041464366018772125
Validation loss = 0.04143912345170975
Validation loss = 0.04210542514920235
Validation loss = 0.03614206984639168
Validation loss = 0.03744841739535332
Validation loss = 0.04682682454586029
Validation loss = 0.04048796370625496
Validation loss = 0.038920484483242035
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.045128870755434036
Validation loss = 0.03700824826955795
Validation loss = 0.04133882001042366
Validation loss = 0.03929833322763443
Validation loss = 0.042875729501247406
Validation loss = 0.053579699248075485
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23863636363636365
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23728813559322035
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23595505617977527
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2346368715083799
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23333333333333334
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23756906077348067
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23626373626373626
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24043715846994534
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2391304347826087
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24324324324324326
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24731182795698925
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24598930481283424
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24468085106382978
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24338624338624337
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24736842105263157
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24607329842931938
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24479166666666666
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24352331606217617
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2422680412371134
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24102564102564103
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23979591836734693
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23857868020304568
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23737373737373738
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23618090452261306
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.235
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -20.8    |
| Iteration     | 6        |
| MaximumReturn | -1.27    |
| MinimumReturn | -72.1    |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.034132931381464005
Validation loss = 0.026833707466721535
Validation loss = 0.030369363725185394
Validation loss = 0.02079053595662117
Validation loss = 0.028059465810656548
Validation loss = 0.023235736414790154
Validation loss = 0.02002648450434208
Validation loss = 0.026612089946866035
Validation loss = 0.02216295897960663
Validation loss = 0.021237051114439964
Validation loss = 0.018141919746994972
Validation loss = 0.020013587549328804
Validation loss = 0.02357691340148449
Validation loss = 0.02175813727080822
Validation loss = 0.025852123275399208
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04173029586672783
Validation loss = 0.03282127529382706
Validation loss = 0.02402481436729431
Validation loss = 0.02401643991470337
Validation loss = 0.025784166529774666
Validation loss = 0.024233706295490265
Validation loss = 0.025148672983050346
Validation loss = 0.021306686103343964
Validation loss = 0.02198425866663456
Validation loss = 0.01995457150042057
Validation loss = 0.02562953345477581
Validation loss = 0.024678850546479225
Validation loss = 0.021584443747997284
Validation loss = 0.02145390398800373
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04053520783782005
Validation loss = 0.03439019247889519
Validation loss = 0.025635259225964546
Validation loss = 0.023027272894978523
Validation loss = 0.02604677341878414
Validation loss = 0.026662148535251617
Validation loss = 0.021331995725631714
Validation loss = 0.02128559909760952
Validation loss = 0.025869613513350487
Validation loss = 0.019470660015940666
Validation loss = 0.0203479565680027
Validation loss = 0.02174723707139492
Validation loss = 0.018582483753561974
Validation loss = 0.018705474212765694
Validation loss = 0.02072870172560215
Validation loss = 0.022444382309913635
Validation loss = 0.02015838585793972
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04082028567790985
Validation loss = 0.0240998063236475
Validation loss = 0.02336733601987362
Validation loss = 0.02545531839132309
Validation loss = 0.021285759285092354
Validation loss = 0.022027073428034782
Validation loss = 0.01916280761361122
Validation loss = 0.0219001192599535
Validation loss = 0.019941555336117744
Validation loss = 0.023286031559109688
Validation loss = 0.019802432507276535
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04074101895093918
Validation loss = 0.0279640331864357
Validation loss = 0.02425537444651127
Validation loss = 0.02887437678873539
Validation loss = 0.025361159816384315
Validation loss = 0.027387147769331932
Validation loss = 0.027373095974326134
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23383084577114427
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23267326732673269
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2315270935960591
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23039215686274508
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22926829268292684
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22815533980582525
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22705314009661837
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22596153846153846
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22488038277511962
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22380952380952382
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22274881516587677
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22169811320754718
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22065727699530516
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21962616822429906
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2186046511627907
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2175925925925926
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21658986175115208
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21559633027522937
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2146118721461187
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21363636363636362
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21266968325791855
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21171171171171171
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21076233183856502
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20982142857142858
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2088888888888889
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0434  |
| Iteration     | 7        |
| MaximumReturn | -0.0241  |
| MinimumReturn | -0.0673  |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02196316421031952
Validation loss = 0.01900414004921913
Validation loss = 0.01935708522796631
Validation loss = 0.016074026003479958
Validation loss = 0.01875828392803669
Validation loss = 0.02122780866920948
Validation loss = 0.017073236405849457
Validation loss = 0.017074938863515854
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.026694390922784805
Validation loss = 0.018314475193619728
Validation loss = 0.017217367887496948
Validation loss = 0.019897231832146645
Validation loss = 0.016690092161297798
Validation loss = 0.024301882833242416
Validation loss = 0.017886560410261154
Validation loss = 0.01619132235646248
Validation loss = 0.01815813221037388
Validation loss = 0.021639075130224228
Validation loss = 0.016450079157948494
Validation loss = 0.018084479495882988
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02307417429983616
Validation loss = 0.017833378165960312
Validation loss = 0.02678634598851204
Validation loss = 0.017224209383130074
Validation loss = 0.0164644718170166
Validation loss = 0.020293349400162697
Validation loss = 0.019152255728840828
Validation loss = 0.016981031745672226
Validation loss = 0.018236393108963966
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022099440917372704
Validation loss = 0.017896680161356926
Validation loss = 0.016842123121023178
Validation loss = 0.017872992902994156
Validation loss = 0.015875834971666336
Validation loss = 0.016086777672171593
Validation loss = 0.017009053379297256
Validation loss = 0.022020462900400162
Validation loss = 0.019791381433606148
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024052560329437256
Validation loss = 0.021725764498114586
Validation loss = 0.022702042013406754
Validation loss = 0.02403140999376774
Validation loss = 0.020498396828770638
Validation loss = 0.018067512661218643
Validation loss = 0.02329912781715393
Validation loss = 0.021096497774124146
Validation loss = 0.02394464984536171
Validation loss = 0.022201552987098694
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2079646017699115
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20704845814977973
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20614035087719298
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2052401746724891
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20434782608695654
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20346320346320346
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2025862068965517
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2017167381974249
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20085470085470086
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19915254237288135
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19831223628691982
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19747899159663865
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19665271966527198
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19583333333333333
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1950207468879668
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19421487603305784
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1934156378600823
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19262295081967212
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19183673469387755
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1910569105691057
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1902834008097166
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18951612903225806
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18875502008032127
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.188
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0403  |
| Iteration     | 8        |
| MaximumReturn | -0.0244  |
| MinimumReturn | -0.0963  |
| TotalSamples  | 16660    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01771104708313942
Validation loss = 0.020168621093034744
Validation loss = 0.02372557669878006
Validation loss = 0.01662200316786766
Validation loss = 0.016703374683856964
Validation loss = 0.0154319629073143
Validation loss = 0.01783030293881893
Validation loss = 0.015094280242919922
Validation loss = 0.016636749729514122
Validation loss = 0.017840707674622536
Validation loss = 0.015071271918714046
Validation loss = 0.01353027205914259
Validation loss = 0.01465891394764185
Validation loss = 0.01985897123813629
Validation loss = 0.013678550720214844
Validation loss = 0.013785205781459808
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02636227011680603
Validation loss = 0.017022090032696724
Validation loss = 0.015683699399232864
Validation loss = 0.018152298405766487
Validation loss = 0.015143967233598232
Validation loss = 0.012708617374300957
Validation loss = 0.013493917882442474
Validation loss = 0.014229477383196354
Validation loss = 0.015852106735110283
Validation loss = 0.017012570053339005
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02169501595199108
Validation loss = 0.015002148225903511
Validation loss = 0.022269567474722862
Validation loss = 0.020230557769536972
Validation loss = 0.019671132788062096
Validation loss = 0.01638915203511715
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01886066049337387
Validation loss = 0.017109327018260956
Validation loss = 0.0165359228849411
Validation loss = 0.015219305641949177
Validation loss = 0.01554582454264164
Validation loss = 0.0152287557721138
Validation loss = 0.017392616719007492
Validation loss = 0.014467695727944374
Validation loss = 0.013986521400511265
Validation loss = 0.015261851251125336
Validation loss = 0.014320572838187218
Validation loss = 0.013578513637185097
Validation loss = 0.01483822613954544
Validation loss = 0.014591999351978302
Validation loss = 0.013223487883806229
Validation loss = 0.020963221788406372
Validation loss = 0.016858235001564026
Validation loss = 0.014467265456914902
Validation loss = 0.013960937038064003
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02242833375930786
Validation loss = 0.018488720059394836
Validation loss = 0.016099002212285995
Validation loss = 0.020407691597938538
Validation loss = 0.019440416246652603
Validation loss = 0.01892443373799324
Validation loss = 0.0174398310482502
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18725099601593626
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1865079365079365
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1857707509881423
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18503937007874016
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1843137254901961
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18359375
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1828793774319066
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1821705426356589
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18146718146718147
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18076923076923077
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18007662835249041
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17938931297709923
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17870722433460076
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17803030303030304
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17735849056603772
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17669172932330826
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1760299625468165
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17537313432835822
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17472118959107807
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17407407407407408
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17343173431734318
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17279411764705882
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17216117216117216
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17153284671532848
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1709090909090909
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0106  |
| Iteration     | 9        |
| MaximumReturn | -0.00616 |
| MinimumReturn | -0.0157  |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015575687400996685
Validation loss = 0.016705453395843506
Validation loss = 0.013701372779905796
Validation loss = 0.01412990316748619
Validation loss = 0.018715113401412964
Validation loss = 0.01727726310491562
Validation loss = 0.013869117945432663
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01564597897231579
Validation loss = 0.01345224678516388
Validation loss = 0.015311898663640022
Validation loss = 0.017008299008011818
Validation loss = 0.015227683819830418
Validation loss = 0.015883849933743477
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019427888095378876
Validation loss = 0.017881032079458237
Validation loss = 0.01647450029850006
Validation loss = 0.014379352331161499
Validation loss = 0.013182458467781544
Validation loss = 0.01349711511284113
Validation loss = 0.016957169398665428
Validation loss = 0.01439882442355156
Validation loss = 0.01397707685828209
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016190046444535255
Validation loss = 0.015002376399934292
Validation loss = 0.012667374685406685
Validation loss = 0.0123436925932765
Validation loss = 0.01675022393465042
Validation loss = 0.018893999978899956
Validation loss = 0.01926938258111477
Validation loss = 0.01445100735872984
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017605207860469818
Validation loss = 0.015710478648543358
Validation loss = 0.015763483941555023
Validation loss = 0.01650727540254593
Validation loss = 0.015058153308928013
Validation loss = 0.018629688769578934
Validation loss = 0.01895451545715332
Validation loss = 0.023091109469532967
Validation loss = 0.01670036092400551
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17028985507246377
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16967509025270758
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16906474820143885
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16845878136200718
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16785714285714284
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16725978647686832
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16666666666666666
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16607773851590105
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16549295774647887
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1649122807017544
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16433566433566432
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16376306620209058
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16319444444444445
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16262975778546712
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16206896551724137
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16151202749140894
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16095890410958905
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16040955631399317
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1598639455782313
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15932203389830507
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15878378378378377
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15824915824915825
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15771812080536912
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15719063545150502
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15666666666666668
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.144   |
| Iteration     | 10       |
| MaximumReturn | -0.0452  |
| MinimumReturn | -0.334   |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017911473289132118
Validation loss = 0.011994386091828346
Validation loss = 0.015871277078986168
Validation loss = 0.01322471909224987
Validation loss = 0.01579611748456955
Validation loss = 0.014636708423495293
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017091933637857437
Validation loss = 0.013484510593116283
Validation loss = 0.014418080449104309
Validation loss = 0.015424126759171486
Validation loss = 0.014074837788939476
Validation loss = 0.013841979205608368
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016047148033976555
Validation loss = 0.017606964334845543
Validation loss = 0.014034452848136425
Validation loss = 0.012804238125681877
Validation loss = 0.014459826052188873
Validation loss = 0.01784443110227585
Validation loss = 0.013714974746108055
Validation loss = 0.015006279572844505
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01606285199522972
Validation loss = 0.014647034928202629
Validation loss = 0.013734506443142891
Validation loss = 0.011541886255145073
Validation loss = 0.015770455822348595
Validation loss = 0.01364106684923172
Validation loss = 0.014382061548531055
Validation loss = 0.01411543507128954
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017528053373098373
Validation loss = 0.01420271210372448
Validation loss = 0.016682129353284836
Validation loss = 0.01318978238850832
Validation loss = 0.019050464034080505
Validation loss = 0.017636582255363464
Validation loss = 0.019630730152130127
Validation loss = 0.013841968961060047
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15614617940199335
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15562913907284767
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1551155115511551
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15460526315789475
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1540983606557377
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15359477124183007
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15309446254071662
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1525974025974026
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15210355987055016
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15161290322580645
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15112540192926044
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15064102564102563
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1501597444089457
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14968152866242038
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1492063492063492
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14873417721518986
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14826498422712933
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14779874213836477
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14733542319749215
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.146875
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14641744548286603
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14596273291925466
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14551083591331268
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14506172839506173
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14461538461538462
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.93    |
| Iteration     | 11       |
| MaximumReturn | -0.0331  |
| MinimumReturn | -46.8    |
| TotalSamples  | 21658    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014350341632962227
Validation loss = 0.012727608904242516
Validation loss = 0.016529204323887825
Validation loss = 0.015627343207597733
Validation loss = 0.01488201878964901
Validation loss = 0.017722446471452713
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024530936032533646
Validation loss = 0.018528535962104797
Validation loss = 0.015351231209933758
Validation loss = 0.01839565858244896
Validation loss = 0.015273240394890308
Validation loss = 0.018292631953954697
Validation loss = 0.01389309298247099
Validation loss = 0.016381671652197838
Validation loss = 0.01248360425233841
Validation loss = 0.012205539271235466
Validation loss = 0.013993124477565289
Validation loss = 0.015028069727122784
Validation loss = 0.016081511974334717
Validation loss = 0.015533531084656715
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014334809966385365
Validation loss = 0.019061822444200516
Validation loss = 0.012914794497191906
Validation loss = 0.014744157902896404
Validation loss = 0.01157799270004034
Validation loss = 0.012384271249175072
Validation loss = 0.012941675260663033
Validation loss = 0.011988515965640545
Validation loss = 0.014429156668484211
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01293810736387968
Validation loss = 0.015513753518462181
Validation loss = 0.013280510902404785
Validation loss = 0.01884995400905609
Validation loss = 0.013652774505317211
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016011396422982216
Validation loss = 0.017050916329026222
Validation loss = 0.01410887110978365
Validation loss = 0.01671675220131874
Validation loss = 0.019186455756425858
Validation loss = 0.014872836880385876
Validation loss = 0.013500111177563667
Validation loss = 0.020792799070477486
Validation loss = 0.015615910291671753
Validation loss = 0.014394333586096764
Validation loss = 0.012137196026742458
Validation loss = 0.015009582042694092
Validation loss = 0.015270106494426727
Validation loss = 0.012324544601142406
Validation loss = 0.013395573012530804
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1441717791411043
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1437308868501529
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14329268292682926
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14285714285714285
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14242424242424243
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1419939577039275
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14156626506024098
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14114114114114115
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1407185628742515
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14029850746268657
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13988095238095238
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1394658753709199
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1390532544378698
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13864306784660768
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13823529411764707
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1378299120234604
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13742690058479531
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13702623906705538
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13662790697674418
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13623188405797101
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13583815028901733
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13544668587896252
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13505747126436782
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1346704871060172
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13428571428571429
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00455 |
| Iteration     | 12       |
| MaximumReturn | -0.00344 |
| MinimumReturn | -0.00628 |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014256259426474571
Validation loss = 0.01200456265360117
Validation loss = 0.015658466145396233
Validation loss = 0.01795359142124653
Validation loss = 0.012348029762506485
Validation loss = 0.01458609476685524
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013026981614530087
Validation loss = 0.015294145792722702
Validation loss = 0.012379687279462814
Validation loss = 0.01235659047961235
Validation loss = 0.011142422445118427
Validation loss = 0.012111627496778965
Validation loss = 0.011847554706037045
Validation loss = 0.011648744344711304
Validation loss = 0.017433661967515945
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013443243689835072
Validation loss = 0.0166663508862257
Validation loss = 0.016758080571889877
Validation loss = 0.013847196474671364
Validation loss = 0.013108192943036556
Validation loss = 0.013632106594741344
Validation loss = 0.014372834004461765
Validation loss = 0.013815388083457947
Validation loss = 0.012504941783845425
Validation loss = 0.018936220556497574
Validation loss = 0.01290130615234375
Validation loss = 0.011928671039640903
Validation loss = 0.01195453479886055
Validation loss = 0.010999251157045364
Validation loss = 0.011148280464112759
Validation loss = 0.013080310076475143
Validation loss = 0.011968803592026234
Validation loss = 0.01405892614275217
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015279480256140232
Validation loss = 0.013127539306879044
Validation loss = 0.01173368003219366
Validation loss = 0.014415680430829525
Validation loss = 0.02525615133345127
Validation loss = 0.011333591304719448
Validation loss = 0.014924287796020508
Validation loss = 0.013340502046048641
Validation loss = 0.012380668893456459
Validation loss = 0.013096311129629612
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012715703807771206
Validation loss = 0.019582731649279594
Validation loss = 0.012831863947212696
Validation loss = 0.017386192455887794
Validation loss = 0.013389918021857738
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1339031339031339
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13352272727272727
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13314447592067988
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1327683615819209
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1323943661971831
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13202247191011235
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13165266106442577
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13128491620111732
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1309192200557103
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13055555555555556
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13019390581717452
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1298342541436464
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12947658402203857
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12912087912087913
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12876712328767123
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1284153005464481
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12806539509536785
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12771739130434784
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12737127371273713
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12702702702702703
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12668463611859837
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12634408602150538
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1260053619302949
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12566844919786097
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12533333333333332
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0445  |
| Iteration     | 13       |
| MaximumReturn | -0.0204  |
| MinimumReturn | -0.0847  |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01380822341889143
Validation loss = 0.012572585605084896
Validation loss = 0.013098888099193573
Validation loss = 0.01355951651930809
Validation loss = 0.015351441688835621
Validation loss = 0.014406212605535984
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014920085668563843
Validation loss = 0.01316848024725914
Validation loss = 0.012344804592430592
Validation loss = 0.011843502521514893
Validation loss = 0.013916038908064365
Validation loss = 0.014721523970365524
Validation loss = 0.010964907705783844
Validation loss = 0.011041309684515
Validation loss = 0.012515727430582047
Validation loss = 0.011879715137183666
Validation loss = 0.011493873782455921
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012401956133544445
Validation loss = 0.012223460711538792
Validation loss = 0.01259305328130722
Validation loss = 0.01099818479269743
Validation loss = 0.013648315332829952
Validation loss = 0.016491392627358437
Validation loss = 0.01675100438296795
Validation loss = 0.012776612304151058
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011171332560479641
Validation loss = 0.011754967272281647
Validation loss = 0.012148417532444
Validation loss = 0.011870182119309902
Validation loss = 0.01090095192193985
Validation loss = 0.01156581286340952
Validation loss = 0.012838243506848812
Validation loss = 0.010207277722656727
Validation loss = 0.01073820237070322
Validation loss = 0.010719802230596542
Validation loss = 0.014433678239583969
Validation loss = 0.011023618280887604
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016159631311893463
Validation loss = 0.012578449212014675
Validation loss = 0.01522022858262062
Validation loss = 0.015403684228658676
Validation loss = 0.01437282282859087
Validation loss = 0.012952920980751514
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.125
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1246684350132626
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12433862433862433
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12401055408970976
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12368421052631579
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12335958005249344
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12303664921465969
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1227154046997389
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12239583333333333
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12207792207792208
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12176165803108809
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12144702842377261
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1211340206185567
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12082262210796915
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12051282051282051
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12020460358056266
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11989795918367346
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11959287531806616
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11928934010152284
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1189873417721519
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11868686868686869
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11838790931989925
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11809045226130653
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11779448621553884
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1175
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00205 |
| Iteration     | 14       |
| MaximumReturn | -0.00144 |
| MinimumReturn | -0.00289 |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01239783689379692
Validation loss = 0.014653830789029598
Validation loss = 0.01155081856995821
Validation loss = 0.011151651851832867
Validation loss = 0.013715000823140144
Validation loss = 0.012473832815885544
Validation loss = 0.01219887938350439
Validation loss = 0.011543610133230686
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012187467887997627
Validation loss = 0.01336737908422947
Validation loss = 0.01675655134022236
Validation loss = 0.01317519973963499
Validation loss = 0.01245077420026064
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011649285443127155
Validation loss = 0.01491389237344265
Validation loss = 0.011356230825185776
Validation loss = 0.010396568104624748
Validation loss = 0.012059556320309639
Validation loss = 0.011424552649259567
Validation loss = 0.010719732381403446
Validation loss = 0.01197067741304636
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012669377960264683
Validation loss = 0.013233126141130924
Validation loss = 0.013348250649869442
Validation loss = 0.011410108767449856
Validation loss = 0.01622086577117443
Validation loss = 0.01140139065682888
Validation loss = 0.01129909884184599
Validation loss = 0.010628142394125462
Validation loss = 0.012012715451419353
Validation loss = 0.011988675221800804
Validation loss = 0.015305736102163792
Validation loss = 0.011136605404317379
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013650625944137573
Validation loss = 0.012300319038331509
Validation loss = 0.01254955306649208
Validation loss = 0.01438815239816904
Validation loss = 0.012373452074825764
Validation loss = 0.011658878065645695
Validation loss = 0.011569513007998466
Validation loss = 0.013617298565804958
Validation loss = 0.011679421178996563
Validation loss = 0.015257509425282478
Validation loss = 0.011896545998752117
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1172069825436409
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11691542288557213
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11662531017369727
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11633663366336634
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11604938271604938
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11576354679802955
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11547911547911548
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11519607843137254
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11491442542787286
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11463414634146342
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11435523114355231
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11407766990291263
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11380145278450363
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11352657004830918
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11325301204819277
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11298076923076923
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11270983213429256
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11244019138755981
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11217183770883055
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11190476190476191
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11163895486935867
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11137440758293839
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1111111111111111
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11084905660377359
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11058823529411765
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00127  |
| Iteration     | 15        |
| MaximumReturn | -0.000847 |
| MinimumReturn | -0.00205  |
| TotalSamples  | 28322     |
-----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013983973301947117
Validation loss = 0.011607566848397255
Validation loss = 0.012649922631680965
Validation loss = 0.011532120406627655
Validation loss = 0.012063142843544483
Validation loss = 0.01435279380530119
Validation loss = 0.014023909345269203
Validation loss = 0.013384291902184486
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01119888573884964
Validation loss = 0.010672838427126408
Validation loss = 0.022139549255371094
Validation loss = 0.015414762310683727
Validation loss = 0.01187650952488184
Validation loss = 0.012511000037193298
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013420836068689823
Validation loss = 0.015004082582890987
Validation loss = 0.013482858426868916
Validation loss = 0.010190032422542572
Validation loss = 0.011060433462262154
Validation loss = 0.010017571970820427
Validation loss = 0.015105902217328548
Validation loss = 0.010727335698902607
Validation loss = 0.01194825954735279
Validation loss = 0.012873070314526558
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018550127744674683
Validation loss = 0.010065587237477303
Validation loss = 0.010527611710131168
Validation loss = 0.012280779890716076
Validation loss = 0.012201684527099133
Validation loss = 0.012455041520297527
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013881916180253029
Validation loss = 0.016130348667502403
Validation loss = 0.013035980984568596
Validation loss = 0.010637158527970314
Validation loss = 0.011367778293788433
Validation loss = 0.013224816881120205
Validation loss = 0.01304969284683466
Validation loss = 0.015720250084996223
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11032863849765258
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11007025761124122
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10981308411214953
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10955710955710955
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10930232558139535
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10904872389791183
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1087962962962963
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10854503464203233
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.11059907834101383
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1103448275862069
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11009174311926606
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10983981693363844
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1095890410958904
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10933940774487472
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10909090909090909
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10884353741496598
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1085972850678733
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10835214446952596
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10810810810810811
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10786516853932585
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10762331838565023
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10738255033557047
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10714285714285714
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10690423162583519
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10666666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.97    |
| Iteration     | 16       |
| MaximumReturn | -0.052   |
| MinimumReturn | -35.2    |
| TotalSamples  | 29988    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02178053930401802
Validation loss = 0.016330981627106667
Validation loss = 0.0153672955930233
Validation loss = 0.01583823375403881
Validation loss = 0.016572894528508186
Validation loss = 0.014852900989353657
Validation loss = 0.015880875289440155
Validation loss = 0.015443475916981697
Validation loss = 0.0175043772906065
Validation loss = 0.014982511289417744
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022973932325839996
Validation loss = 0.015815239399671555
Validation loss = 0.015498717315495014
Validation loss = 0.016089584678411484
Validation loss = 0.01467036735266447
Validation loss = 0.01818697527050972
Validation loss = 0.01557307317852974
Validation loss = 0.01466501597315073
Validation loss = 0.01594630442559719
Validation loss = 0.022215818986296654
Validation loss = 0.015191291458904743
Validation loss = 0.014558129943907261
Validation loss = 0.014260444790124893
Validation loss = 0.015433629043400288
Validation loss = 0.014896826818585396
Validation loss = 0.01577303744852543
Validation loss = 0.015868017449975014
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023957252502441406
Validation loss = 0.01634378172457218
Validation loss = 0.015940861776471138
Validation loss = 0.015286805108189583
Validation loss = 0.01601521484553814
Validation loss = 0.016413623467087746
Validation loss = 0.015712929889559746
Validation loss = 0.01648056134581566
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018674863502383232
Validation loss = 0.015387645922601223
Validation loss = 0.01735142432153225
Validation loss = 0.015230182558298111
Validation loss = 0.015877973288297653
Validation loss = 0.015678202733397484
Validation loss = 0.02220119908452034
Validation loss = 0.014804997481405735
Validation loss = 0.015819348394870758
Validation loss = 0.01410876028239727
Validation loss = 0.01582786999642849
Validation loss = 0.017549311742186546
Validation loss = 0.01427503116428852
Validation loss = 0.01715182140469551
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01792856864631176
Validation loss = 0.01826462522149086
Validation loss = 0.015014057978987694
Validation loss = 0.017640916630625725
Validation loss = 0.014843572862446308
Validation loss = 0.014994188211858273
Validation loss = 0.01513529010117054
Validation loss = 0.014924445189535618
Validation loss = 0.01826207898557186
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10643015521064302
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10619469026548672
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10596026490066225
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10572687224669604
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1054945054945055
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10526315789473684
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1050328227571116
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10480349344978165
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10457516339869281
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10434782608695652
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10412147505422993
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1038961038961039
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10367170626349892
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10344827586206896
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1032258064516129
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10300429184549356
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10278372591006424
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10256410256410256
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1023454157782516
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10212765957446808
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10191082802547771
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1038135593220339
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10359408033826638
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10337552742616034
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1031578947368421
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.03    |
| Iteration     | 17       |
| MaximumReturn | -0.0334  |
| MinimumReturn | -18.4    |
| TotalSamples  | 31654    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01637551560997963
Validation loss = 0.014283371157944202
Validation loss = 0.0157774705439806
Validation loss = 0.019450632855296135
Validation loss = 0.014179076068103313
Validation loss = 0.014317403547465801
Validation loss = 0.017903858795762062
Validation loss = 0.014203842729330063
Validation loss = 0.014404446817934513
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017142627388238907
Validation loss = 0.014758777804672718
Validation loss = 0.017047038301825523
Validation loss = 0.01686616986989975
Validation loss = 0.014990822412073612
Validation loss = 0.014902159571647644
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01634538732469082
Validation loss = 0.015108851715922356
Validation loss = 0.015113643370568752
Validation loss = 0.016249513253569603
Validation loss = 0.01411479152739048
Validation loss = 0.013984238728880882
Validation loss = 0.0191293116658926
Validation loss = 0.01525037456303835
Validation loss = 0.014467827044427395
Validation loss = 0.016345499083399773
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017623014748096466
Validation loss = 0.015406934544444084
Validation loss = 0.01598971337080002
Validation loss = 0.015101511962711811
Validation loss = 0.017515651881694794
Validation loss = 0.013948249630630016
Validation loss = 0.014929711818695068
Validation loss = 0.014856376685202122
Validation loss = 0.014664609916508198
Validation loss = 0.015679286792874336
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.026939067989587784
Validation loss = 0.015312905423343182
Validation loss = 0.015231507830321789
Validation loss = 0.01791040040552616
Validation loss = 0.01439242996275425
Validation loss = 0.021106386557221413
Validation loss = 0.01534454058855772
Validation loss = 0.01614338532090187
Validation loss = 0.015625787898898125
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10294117647058823
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10272536687631027
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10251046025104603
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1022964509394572
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10208333333333333
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10187110187110188
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1016597510373444
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10144927536231885
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1012396694214876
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10103092783505155
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10082304526748971
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10061601642710473
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10040983606557377
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10020449897750511
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09979633401221996
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09959349593495935
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09939148073022312
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09919028340080972
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09898989898989899
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09879032258064516
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09859154929577464
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09839357429718876
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09819639278557114
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.098
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00717 |
| Iteration     | 18       |
| MaximumReturn | -0.00534 |
| MinimumReturn | -0.00981 |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020726243034005165
Validation loss = 0.0222896970808506
Validation loss = 0.020986240357160568
Validation loss = 0.02141311764717102
Validation loss = 0.02103554829955101
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021672982722520828
Validation loss = 0.02189759537577629
Validation loss = 0.020473044365644455
Validation loss = 0.02494039013981819
Validation loss = 0.0224410779774189
Validation loss = 0.02079574391245842
Validation loss = 0.021852491423487663
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021805286407470703
Validation loss = 0.020712533965706825
Validation loss = 0.020017284899950027
Validation loss = 0.019388718530535698
Validation loss = 0.02079653926193714
Validation loss = 0.020763887092471123
Validation loss = 0.01983649842441082
Validation loss = 0.02087000012397766
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02196390926837921
Validation loss = 0.021031172946095467
Validation loss = 0.029494978487491608
Validation loss = 0.021902363747358322
Validation loss = 0.019931120797991753
Validation loss = 0.021706242114305496
Validation loss = 0.020211108028888702
Validation loss = 0.019915541633963585
Validation loss = 0.020306991413235664
Validation loss = 0.0248074010014534
Validation loss = 0.021536916494369507
Validation loss = 0.023646283894777298
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022905217483639717
Validation loss = 0.023005980998277664
Validation loss = 0.025244057178497314
Validation loss = 0.025670675560832024
Validation loss = 0.021078329533338547
Validation loss = 0.022363267838954926
Validation loss = 0.021386681124567986
Validation loss = 0.020296650007367134
Validation loss = 0.02107328549027443
Validation loss = 0.02074394002556801
Validation loss = 0.022104229778051376
Validation loss = 0.021844200789928436
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09780439121756487
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.099601593625498
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09940357852882704
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0992063492063492
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09900990099009901
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09881422924901186
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09861932938856016
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0984251968503937
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09823182711198428
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.10567514677103718
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10546875
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10526315789473684
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10505836575875487
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10485436893203884
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1065891472868217
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10638297872340426
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10617760617760617
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10597302504816955
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10576923076923077
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10556621880998081
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1053639846743295
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10516252390057361
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1049618320610687
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10476190476190476
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.77    |
| Iteration     | 19       |
| MaximumReturn | -0.0386  |
| MinimumReturn | -85.1    |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020216001197695732
Validation loss = 0.020874273031949997
Validation loss = 0.020165320485830307
Validation loss = 0.02145228162407875
Validation loss = 0.020124927163124084
Validation loss = 0.02096986211836338
Validation loss = 0.01980462856590748
Validation loss = 0.01992459036409855
Validation loss = 0.019560672342777252
Validation loss = 0.019937295466661453
Validation loss = 0.019881924614310265
Validation loss = 0.021258316934108734
Validation loss = 0.0227262731641531
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022955574095249176
Validation loss = 0.020016813650727272
Validation loss = 0.019483504816889763
Validation loss = 0.021620191633701324
Validation loss = 0.01908317394554615
Validation loss = 0.019912688061594963
Validation loss = 0.01971506141126156
Validation loss = 0.02084793709218502
Validation loss = 0.022310370579361916
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020778164267539978
Validation loss = 0.01967344619333744
Validation loss = 0.018834836781024933
Validation loss = 0.018948296085000038
Validation loss = 0.019340993836522102
Validation loss = 0.019501756876707077
Validation loss = 0.02228807471692562
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020702043548226357
Validation loss = 0.02444002591073513
Validation loss = 0.01997072622179985
Validation loss = 0.019414983689785004
Validation loss = 0.02053997851908207
Validation loss = 0.019191281870007515
Validation loss = 0.020631734281778336
Validation loss = 0.019502585753798485
Validation loss = 0.019663648679852486
Validation loss = 0.020162058994174004
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022502494975924492
Validation loss = 0.02302098274230957
Validation loss = 0.02340901643037796
Validation loss = 0.02117047645151615
Validation loss = 0.02024863474071026
Validation loss = 0.019096774980425835
Validation loss = 0.019472576677799225
Validation loss = 0.02049252763390541
Validation loss = 0.01960001140832901
Validation loss = 0.019733577966690063
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10456273764258556
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10436432637571158
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10416666666666667
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10396975425330812
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10377358490566038
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10357815442561205
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10338345864661654
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10318949343339587
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10299625468164794
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.102803738317757
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10261194029850747
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10242085661080075
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10223048327137546
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10204081632653061
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10185185185185185
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10166358595194085
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1014760147601476
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10128913443830571
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10110294117647059
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10091743119266056
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10073260073260074
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10054844606946983
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10036496350364964
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10018214936247723
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00682 |
| Iteration     | 20       |
| MaximumReturn | -0.00529 |
| MinimumReturn | -0.0098  |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0197018813341856
Validation loss = 0.02099469304084778
Validation loss = 0.020959291607141495
Validation loss = 0.019921323284506798
Validation loss = 0.021835539489984512
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021287592127919197
Validation loss = 0.023104799911379814
Validation loss = 0.01975051686167717
Validation loss = 0.019916776567697525
Validation loss = 0.023775536566972733
Validation loss = 0.021497899666428566
Validation loss = 0.020270084962248802
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022161997854709625
Validation loss = 0.022444387897849083
Validation loss = 0.024378323927521706
Validation loss = 0.019913114607334137
Validation loss = 0.01974097639322281
Validation loss = 0.02024947479367256
Validation loss = 0.020306192338466644
Validation loss = 0.019776733592152596
Validation loss = 0.02157474309206009
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02090139128267765
Validation loss = 0.022016290575265884
Validation loss = 0.021190805360674858
Validation loss = 0.02025292068719864
Validation loss = 0.020365312695503235
Validation loss = 0.022679489105939865
Validation loss = 0.026646194979548454
Validation loss = 0.023098809644579887
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023233380168676376
Validation loss = 0.02144761197268963
Validation loss = 0.021500416100025177
Validation loss = 0.02207036130130291
Validation loss = 0.023860149085521698
Validation loss = 0.019784672185778618
Validation loss = 0.020640525966882706
Validation loss = 0.0198432095348835
Validation loss = 0.022451234981417656
Validation loss = 0.021762609481811523
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0998185117967332
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09963768115942029
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09945750452079566
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09927797833935018
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0990990990990991
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09892086330935251
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09874326750448834
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0985663082437276
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09838998211091235
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09821428571428571
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09803921568627451
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09786476868327403
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09769094138543517
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0975177304964539
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09734513274336283
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09717314487632508
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09700176366843033
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09683098591549295
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09666080843585237
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09649122807017543
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09632224168126094
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09615384615384616
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09598603839441536
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09581881533101046
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09565217391304348
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0149  |
| Iteration     | 21       |
| MaximumReturn | -0.00931 |
| MinimumReturn | -0.0267  |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01963871158659458
Validation loss = 0.019552385434508324
Validation loss = 0.020536527037620544
Validation loss = 0.020452283322811127
Validation loss = 0.022137602791190147
Validation loss = 0.020550640299916267
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020298538729548454
Validation loss = 0.02154572308063507
Validation loss = 0.0196214709430933
Validation loss = 0.02123749628663063
Validation loss = 0.020165598019957542
Validation loss = 0.021095240488648415
Validation loss = 0.019479822367429733
Validation loss = 0.02051866427063942
Validation loss = 0.020249560475349426
Validation loss = 0.02065313048660755
Validation loss = 0.021298689767718315
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019688433036208153
Validation loss = 0.018847184255719185
Validation loss = 0.019537538290023804
Validation loss = 0.01936953514814377
Validation loss = 0.02026352472603321
Validation loss = 0.019515978172421455
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02079658955335617
Validation loss = 0.02103400230407715
Validation loss = 0.019851474091410637
Validation loss = 0.020174022763967514
Validation loss = 0.020082443952560425
Validation loss = 0.020224934443831444
Validation loss = 0.019870780408382416
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021759238094091415
Validation loss = 0.02055850438773632
Validation loss = 0.01994081400334835
Validation loss = 0.02343893237411976
Validation loss = 0.019529037177562714
Validation loss = 0.02426094189286232
Validation loss = 0.019253041595220566
Validation loss = 0.02179786004126072
Validation loss = 0.021671365946531296
Validation loss = 0.02060786448419094
Validation loss = 0.019735340029001236
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0954861111111111
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09532062391681109
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09515570934256055
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09499136442141623
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09482758620689655
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09466437177280551
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09450171821305842
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09433962264150944
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09417808219178082
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09401709401709402
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09385665529010238
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09369676320272573
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0935374149659864
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0933786078098472
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09322033898305085
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09306260575296109
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0929054054054054
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09274873524451939
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09259259259259259
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09243697478991597
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09228187919463088
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09212730318257957
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09197324414715718
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09181969949916527
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09166666666666666
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0012   |
| Iteration     | 22        |
| MaximumReturn | -0.000838 |
| MinimumReturn | -0.00178  |
| TotalSamples  | 39984     |
-----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021047033369541168
Validation loss = 0.020250018686056137
Validation loss = 0.01984059438109398
Validation loss = 0.01947278529405594
Validation loss = 0.020526090636849403
Validation loss = 0.020400172099471092
Validation loss = 0.01998395472764969
Validation loss = 0.01925349049270153
Validation loss = 0.018863048404455185
Validation loss = 0.01989043690264225
Validation loss = 0.02549969218671322
Validation loss = 0.0191876832395792
Validation loss = 0.02025308832526207
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024003766477108
Validation loss = 0.02012796700000763
Validation loss = 0.019624371081590652
Validation loss = 0.02041257545351982
Validation loss = 0.02251441217958927
Validation loss = 0.020878225564956665
Validation loss = 0.01956772431731224
Validation loss = 0.01998073235154152
Validation loss = 0.02041781134903431
Validation loss = 0.019780073314905167
Validation loss = 0.019634874537587166
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019189009442925453
Validation loss = 0.019366180524230003
Validation loss = 0.01915930211544037
Validation loss = 0.020964430645108223
Validation loss = 0.019634965807199478
Validation loss = 0.0197560153901577
Validation loss = 0.021266506984829903
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02034592069685459
Validation loss = 0.018973136320710182
Validation loss = 0.020880261436104774
Validation loss = 0.02314487099647522
Validation loss = 0.01958099752664566
Validation loss = 0.01982581987977028
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01972367987036705
Validation loss = 0.01879708841443062
Validation loss = 0.0229242704808712
Validation loss = 0.020211875438690186
Validation loss = 0.019989505410194397
Validation loss = 0.019614864140748978
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09151414309484193
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09136212624584718
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0912106135986733
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09105960264900662
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09090909090909091
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09075907590759076
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09060955518945635
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09046052631578948
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.090311986863711
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09016393442622951
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09001636661211129
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08986928104575163
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08972267536704731
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08957654723127036
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08943089430894309
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08928571428571429
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08914100486223663
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0889967637540453
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0888529886914378
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08870967741935484
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08856682769726248
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08842443729903537
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08828250401284109
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08814102564102565
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.088
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00308 |
| Iteration     | 23       |
| MaximumReturn | -0.00199 |
| MinimumReturn | -0.00453 |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02179519459605217
Validation loss = 0.01956949755549431
Validation loss = 0.02026287466287613
Validation loss = 0.021143505349755287
Validation loss = 0.020260285586118698
Validation loss = 0.022074174135923386
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01940159685909748
Validation loss = 0.021170588210225105
Validation loss = 0.02058256044983864
Validation loss = 0.02019747532904148
Validation loss = 0.01972431316971779
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022066757082939148
Validation loss = 0.019920747727155685
Validation loss = 0.020090457051992416
Validation loss = 0.02084420993924141
Validation loss = 0.020262988284230232
Validation loss = 0.019501013681292534
Validation loss = 0.021505555137991905
Validation loss = 0.02020237222313881
Validation loss = 0.019051749259233475
Validation loss = 0.024171434342861176
Validation loss = 0.020089926198124886
Validation loss = 0.020237309858202934
Validation loss = 0.021299321204423904
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020448436960577965
Validation loss = 0.019452953711152077
Validation loss = 0.021667804569005966
Validation loss = 0.01986970193684101
Validation loss = 0.019910840317606926
Validation loss = 0.02078894153237343
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02085207588970661
Validation loss = 0.021696986630558968
Validation loss = 0.022844938561320305
Validation loss = 0.020050203427672386
Validation loss = 0.021985357627272606
Validation loss = 0.021577052772045135
Validation loss = 0.020479440689086914
Validation loss = 0.019707659259438515
Validation loss = 0.021145958453416824
Validation loss = 0.02140657976269722
Validation loss = 0.02071085013449192
Validation loss = 0.020846737548708916
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0878594249201278
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08771929824561403
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0875796178343949
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08744038155802862
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0873015873015873
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08716323296354993
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08702531645569621
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08688783570300158
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08675078864353312
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08661417322834646
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08647798742138364
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08634222919937205
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08620689655172414
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08607198748043818
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0859375
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08580343213728549
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08566978193146417
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08553654743390357
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08540372670807453
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08527131782945736
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08513931888544891
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08500772797527048
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08487654320987655
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0847457627118644
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08461538461538462
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000979 |
| Iteration     | 24        |
| MaximumReturn | -0.000624 |
| MinimumReturn | -0.00132  |
| TotalSamples  | 43316     |
-----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020026611164212227
Validation loss = 0.019449692219495773
Validation loss = 0.020057514309883118
Validation loss = 0.019835278391838074
Validation loss = 0.019707372412085533
Validation loss = 0.02345447614789009
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020738467574119568
Validation loss = 0.01976199448108673
Validation loss = 0.02042326144874096
Validation loss = 0.021494416519999504
Validation loss = 0.020478790625929832
Validation loss = 0.019526876509189606
Validation loss = 0.01992901787161827
Validation loss = 0.01959196664392948
Validation loss = 0.019315913319587708
Validation loss = 0.02140200510621071
Validation loss = 0.01912784017622471
Validation loss = 0.020481720566749573
Validation loss = 0.0213741734623909
Validation loss = 0.021404393017292023
Validation loss = 0.019984906539320946
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0202102642506361
Validation loss = 0.0217942725867033
Validation loss = 0.021129652857780457
Validation loss = 0.024258801713585854
Validation loss = 0.02382018230855465
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021169355139136314
Validation loss = 0.019739612936973572
Validation loss = 0.021711403504014015
Validation loss = 0.019480833783745766
Validation loss = 0.023472925648093224
Validation loss = 0.019639812409877777
Validation loss = 0.020129531621932983
Validation loss = 0.020270535722374916
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02056315541267395
Validation loss = 0.02129572443664074
Validation loss = 0.02059798128902912
Validation loss = 0.021918436512351036
Validation loss = 0.01957998238503933
Validation loss = 0.020971018821001053
Validation loss = 0.019150249660015106
Validation loss = 0.020170744508504868
Validation loss = 0.020233217626810074
Validation loss = 0.02122766710817814
Validation loss = 0.02039778046309948
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08448540706605223
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.08588957055214724
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.0903522205206738
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09174311926605505
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09312977099236641
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09298780487804878
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0928462709284627
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09270516717325228
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09408194233687406
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09393939393939393
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09531013615733737
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09667673716012085
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09653092006033183
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0963855421686747
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0962406015037594
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09759759759759759
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09745127436281859
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09730538922155689
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09865470852017937
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09850746268656717
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09985096870342772
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09970238095238096
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.10698365527488855
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10682492581602374
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10666666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -5.51    |
| Iteration     | 25       |
| MaximumReturn | -0.0432  |
| MinimumReturn | -45.1    |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02697819471359253
Validation loss = 0.0223370548337698
Validation loss = 0.021133793517947197
Validation loss = 0.021579580381512642
Validation loss = 0.020508956164121628
Validation loss = 0.022769546136260033
Validation loss = 0.02046303264796734
Validation loss = 0.027089517563581467
Validation loss = 0.020103557035326958
Validation loss = 0.019955577328801155
Validation loss = 0.023150412365794182
Validation loss = 0.02111334353685379
Validation loss = 0.02093173936009407
Validation loss = 0.01969262771308422
Validation loss = 0.023383229970932007
Validation loss = 0.02276366949081421
Validation loss = 0.02058504894375801
Validation loss = 0.02204454317688942
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021760959178209305
Validation loss = 0.021935252472758293
Validation loss = 0.02019091509282589
Validation loss = 0.019751591607928276
Validation loss = 0.022664083167910576
Validation loss = 0.019882285967469215
Validation loss = 0.020290367305278778
Validation loss = 0.02062102034687996
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023318856954574585
Validation loss = 0.021120212972164154
Validation loss = 0.021091995760798454
Validation loss = 0.01966690644621849
Validation loss = 0.020488731563091278
Validation loss = 0.020229756832122803
Validation loss = 0.020535705611109734
Validation loss = 0.022185280919075012
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023505019024014473
Validation loss = 0.019572999328374863
Validation loss = 0.021260755136609077
Validation loss = 0.02191145531833172
Validation loss = 0.022432388737797737
Validation loss = 0.021794326603412628
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022500792518258095
Validation loss = 0.020960751920938492
Validation loss = 0.020933793857693672
Validation loss = 0.02015068754553795
Validation loss = 0.019479023292660713
Validation loss = 0.02009081467986107
Validation loss = 0.02079753763973713
Validation loss = 0.020760588347911835
Validation loss = 0.020201317965984344
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10650887573964497
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.10782865583456426
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.10914454277286136
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10898379970544919
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10882352941176471
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10866372980910426
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10850439882697947
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10834553440702782
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.10964912280701754
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10948905109489052
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10932944606413994
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.11062590975254731
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11046511627906977
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11030478955007257
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11014492753623188
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10998552821997105
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10982658959537572
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10966810966810966
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.11095100864553314
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11079136690647481
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11063218390804598
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11047345767575323
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11031518624641834
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11015736766809728
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.6     |
| Iteration     | 26       |
| MaximumReturn | -0.067   |
| MinimumReturn | -30.5    |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02231621742248535
Validation loss = 0.0219641774892807
Validation loss = 0.022944020107388496
Validation loss = 0.021584942936897278
Validation loss = 0.024831753224134445
Validation loss = 0.02217894233763218
Validation loss = 0.02088255248963833
Validation loss = 0.021448004990816116
Validation loss = 0.02081395871937275
Validation loss = 0.022842740640044212
Validation loss = 0.021333767101168633
Validation loss = 0.021679213270545006
Validation loss = 0.022494979202747345
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023031029850244522
Validation loss = 0.021404750645160675
Validation loss = 0.022195305675268173
Validation loss = 0.022894367575645447
Validation loss = 0.022300640121102333
Validation loss = 0.021450961008667946
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024171780794858932
Validation loss = 0.02180304564535618
Validation loss = 0.022877249866724014
Validation loss = 0.02078649029135704
Validation loss = 0.021163031458854675
Validation loss = 0.02105722762644291
Validation loss = 0.02257564850151539
Validation loss = 0.022511163726449013
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022744610905647278
Validation loss = 0.021331630647182465
Validation loss = 0.02095429226756096
Validation loss = 0.021351903676986694
Validation loss = 0.02083016000688076
Validation loss = 0.02141018584370613
Validation loss = 0.022315040230751038
Validation loss = 0.02169695869088173
Validation loss = 0.021544532850384712
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020867059007287025
Validation loss = 0.02192554995417595
Validation loss = 0.02128913439810276
Validation loss = 0.022370340302586555
Validation loss = 0.022013869136571884
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10984308131241084
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10968660968660969
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10953058321479374
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.109375
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10921985815602837
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10906515580736544
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10891089108910891
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10875706214689265
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10860366713681241
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10845070422535211
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10829817158931083
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10814606741573034
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10799438990182328
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10784313725490197
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1076923076923077
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10754189944134078
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10739191073919108
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10724233983286909
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1070931849791377
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10694444444444444
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10679611650485436
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10664819944598337
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10650069156293222
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.10773480662983426
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10758620689655173
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0519  |
| Iteration     | 27       |
| MaximumReturn | -0.0271  |
| MinimumReturn | -0.13    |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021385585889220238
Validation loss = 0.02131616324186325
Validation loss = 0.021419644355773926
Validation loss = 0.02357366494834423
Validation loss = 0.021188698709011078
Validation loss = 0.02106477878987789
Validation loss = 0.020940706133842468
Validation loss = 0.021665334701538086
Validation loss = 0.021366611123085022
Validation loss = 0.023572340607643127
Validation loss = 0.022758344188332558
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020775670185685158
Validation loss = 0.021275268867611885
Validation loss = 0.02239314652979374
Validation loss = 0.020943880081176758
Validation loss = 0.021733039990067482
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019850775599479675
Validation loss = 0.0229615718126297
Validation loss = 0.02128828875720501
Validation loss = 0.021315807476639748
Validation loss = 0.021634673699736595
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02323564700782299
Validation loss = 0.021868713200092316
Validation loss = 0.025035515427589417
Validation loss = 0.02155645191669464
Validation loss = 0.021763669326901436
Validation loss = 0.020941173657774925
Validation loss = 0.02177620492875576
Validation loss = 0.020352624356746674
Validation loss = 0.02122005634009838
Validation loss = 0.021623676642775536
Validation loss = 0.021528145298361778
Validation loss = 0.021926673129200935
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020529549568891525
Validation loss = 0.020138675346970558
Validation loss = 0.020748374983668327
Validation loss = 0.021301424130797386
Validation loss = 0.02097994089126587
Validation loss = 0.02199149876832962
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10743801652892562
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10729023383768914
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10714285714285714
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10699588477366255
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10684931506849316
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.106703146374829
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10655737704918032
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10641200545702592
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10626702997275204
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10612244897959183
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10597826086956522
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10583446404341927
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10569105691056911
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10554803788903924
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10540540540540541
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10526315789473684
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10512129380053908
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10497981157469717
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10483870967741936
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10469798657718121
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10455764075067024
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10441767068273092
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10427807486631016
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1041388518024032
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.104
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00313 |
| Iteration     | 28       |
| MaximumReturn | -0.0023  |
| MinimumReturn | -0.0049  |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02147006429731846
Validation loss = 0.021052302792668343
Validation loss = 0.021131373941898346
Validation loss = 0.022215228527784348
Validation loss = 0.0210112277418375
Validation loss = 0.0221048966050148
Validation loss = 0.02348911203444004
Validation loss = 0.020680135115981102
Validation loss = 0.021562501788139343
Validation loss = 0.021235337480902672
Validation loss = 0.021376661956310272
Validation loss = 0.021262118592858315
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022760091349482536
Validation loss = 0.02264905907213688
Validation loss = 0.020260877907276154
Validation loss = 0.02078796550631523
Validation loss = 0.02143809385597706
Validation loss = 0.022234570235013962
Validation loss = 0.02193373441696167
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02185923606157303
Validation loss = 0.021782539784908295
Validation loss = 0.021938083693385124
Validation loss = 0.0242722537368536
Validation loss = 0.022175421938300133
Validation loss = 0.020516807213425636
Validation loss = 0.021103346720337868
Validation loss = 0.02230112813413143
Validation loss = 0.02070859633386135
Validation loss = 0.02266310155391693
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02110803872346878
Validation loss = 0.022727180272340775
Validation loss = 0.020672153681516647
Validation loss = 0.022289464250206947
Validation loss = 0.023404378443956375
Validation loss = 0.022752568125724792
Validation loss = 0.021946219727396965
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021566253155469894
Validation loss = 0.02018015831708908
Validation loss = 0.019486064091324806
Validation loss = 0.02243603952229023
Validation loss = 0.021304771304130554
Validation loss = 0.02379954420030117
Validation loss = 0.021424541249871254
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10386151797603196
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10372340425531915
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10358565737051793
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10344827586206896
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10331125827814569
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10317460317460317
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10303830911492734
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10290237467018469
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10276679841897234
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10263157894736842
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10249671484888305
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10236220472440945
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10222804718217562
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10209424083769633
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10196078431372549
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10182767624020887
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1016949152542373
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1015625
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10143042912873862
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1012987012987013
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10116731517509728
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10103626943005181
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10090556274256145
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10077519379844961
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10064516129032258
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00492 |
| Iteration     | 29       |
| MaximumReturn | -0.0033  |
| MinimumReturn | -0.00726 |
| TotalSamples  | 51646    |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022204246371984482
Validation loss = 0.024800248444080353
Validation loss = 0.022428292781114578
Validation loss = 0.020994603633880615
Validation loss = 0.02143445983529091
Validation loss = 0.022703705355525017
Validation loss = 0.022627899423241615
Validation loss = 0.021209072321653366
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022624943405389786
Validation loss = 0.021766668185591698
Validation loss = 0.025002172216773033
Validation loss = 0.02121991477906704
Validation loss = 0.021155238151550293
Validation loss = 0.02211553044617176
Validation loss = 0.022795652970671654
Validation loss = 0.02711329236626625
Validation loss = 0.020792143419384956
Validation loss = 0.020383674651384354
Validation loss = 0.024211229756474495
Validation loss = 0.02035509981215
Validation loss = 0.02256174013018608
Validation loss = 0.02170773223042488
Validation loss = 0.022877860814332962
Validation loss = 0.022968027740716934
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.026013385504484177
Validation loss = 0.023123176768422127
Validation loss = 0.020610883831977844
Validation loss = 0.022544529289007187
Validation loss = 0.02284061349928379
Validation loss = 0.022495225071907043
Validation loss = 0.021768661215901375
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021362081170082092
Validation loss = 0.0218083206564188
Validation loss = 0.022889012470841408
Validation loss = 0.022036610171198845
Validation loss = 0.021581733599305153
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021027822047472
Validation loss = 0.020041335374116898
Validation loss = 0.021289009600877762
Validation loss = 0.02006138302385807
Validation loss = 0.02201930247247219
Validation loss = 0.020080337300896645
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10051546391752578
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10038610038610038
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10025706940874037
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10012836970474968
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0998719590268886
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09974424552429667
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09961685823754789
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09948979591836735
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09936305732484077
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09923664122137404
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09911054637865312
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09898477157360407
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09885931558935361
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09873417721518987
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0986093552465234
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09848484848484848
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09836065573770492
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0982367758186398
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09811320754716982
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09798994974874371
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09786700125470514
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09774436090225563
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09762202753441802
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0975
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0047  |
| Iteration     | 30       |
| MaximumReturn | -0.00324 |
| MinimumReturn | -0.00744 |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02244691364467144
Validation loss = 0.021580489352345467
Validation loss = 0.022043731063604355
Validation loss = 0.021059315651655197
Validation loss = 0.02242865413427353
Validation loss = 0.023194806650280952
Validation loss = 0.02154042199254036
Validation loss = 0.02241881750524044
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02250894159078598
Validation loss = 0.0231245756149292
Validation loss = 0.022307656705379486
Validation loss = 0.021351728588342667
Validation loss = 0.023929353803396225
Validation loss = 0.022188888862729073
Validation loss = 0.02257319912314415
Validation loss = 0.02461722306907177
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023633135482668877
Validation loss = 0.021997971460223198
Validation loss = 0.021500330418348312
Validation loss = 0.020603345707058907
Validation loss = 0.021656425669789314
Validation loss = 0.021119780838489532
Validation loss = 0.02554292231798172
Validation loss = 0.021488867700099945
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02088390663266182
Validation loss = 0.02365836687386036
Validation loss = 0.02088201232254505
Validation loss = 0.02228960394859314
Validation loss = 0.024712461978197098
Validation loss = 0.021866532042622566
Validation loss = 0.02150707319378853
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020483408123254776
Validation loss = 0.02038373239338398
Validation loss = 0.020714063197374344
Validation loss = 0.019913945347070694
Validation loss = 0.02112797275185585
Validation loss = 0.021666718646883965
Validation loss = 0.02041282132267952
Validation loss = 0.021781710907816887
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09737827715355805
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09725685785536159
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09713574097135741
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09701492537313433
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0968944099378882
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0967741935483871
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09665427509293681
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09653465346534654
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09641532756489493
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0962962962962963
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09617755856966707
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0960591133004926
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0959409594095941
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09582309582309582
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09570552147239264
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09558823529411764
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09547123623011015
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09535452322738386
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09523809523809523
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0951219512195122
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09500609013398295
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0948905109489051
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09477521263669501
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09466019417475728
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09454545454545454
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00621 |
| Iteration     | 31       |
| MaximumReturn | -0.00406 |
| MinimumReturn | -0.00788 |
| TotalSamples  | 54978    |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0228307843208313
Validation loss = 0.02273358590900898
Validation loss = 0.021739525720477104
Validation loss = 0.022208986803889275
Validation loss = 0.02262655273079872
Validation loss = 0.021410703659057617
Validation loss = 0.02126484178006649
Validation loss = 0.02074531465768814
Validation loss = 0.0216351468116045
Validation loss = 0.021465608850121498
Validation loss = 0.022978683933615685
Validation loss = 0.02175062708556652
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021545017138123512
Validation loss = 0.021225079894065857
Validation loss = 0.021771792322397232
Validation loss = 0.02269810624420643
Validation loss = 0.020863046869635582
Validation loss = 0.022314319387078285
Validation loss = 0.021368131041526794
Validation loss = 0.021538211032748222
Validation loss = 0.02209707349538803
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02450750768184662
Validation loss = 0.02183091826736927
Validation loss = 0.02158859558403492
Validation loss = 0.021234165877103806
Validation loss = 0.023416534066200256
Validation loss = 0.021770289167761803
Validation loss = 0.02143660932779312
Validation loss = 0.021260501816868782
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020421532914042473
Validation loss = 0.0205964557826519
Validation loss = 0.02105986885726452
Validation loss = 0.021534141153097153
Validation loss = 0.02147878333926201
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021139685064554214
Validation loss = 0.02045624516904354
Validation loss = 0.0198264941573143
Validation loss = 0.022454380989074707
Validation loss = 0.02159993350505829
Validation loss = 0.02226964570581913
Validation loss = 0.02206258475780487
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09443099273607748
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.094316807738815
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09420289855072464
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09408926417370325
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09397590361445783
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09386281588447654
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09375
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0936374549819928
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09352517985611511
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09341317365269461
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09330143540669857
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0931899641577061
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09307875894988067
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09296781883194279
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09285714285714286
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09274673008323424
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09263657957244656
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09252669039145907
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0924170616113744
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09230769230769231
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09219858156028368
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09208972845336481
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09198113207547169
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09187279151943463
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09176470588235294
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00105  |
| Iteration     | 32        |
| MaximumReturn | -0.000764 |
| MinimumReturn | -0.00138  |
| TotalSamples  | 56644     |
-----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023152897134423256
Validation loss = 0.023675018921494484
Validation loss = 0.021465377882122993
Validation loss = 0.023765385150909424
Validation loss = 0.021581843495368958
Validation loss = 0.02266150712966919
Validation loss = 0.023372352123260498
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02271042950451374
Validation loss = 0.021062001585960388
Validation loss = 0.02069544419646263
Validation loss = 0.022742772474884987
Validation loss = 0.02104218490421772
Validation loss = 0.022665629163384438
Validation loss = 0.020950153470039368
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020876171067357063
Validation loss = 0.020378103479743004
Validation loss = 0.021919921040534973
Validation loss = 0.020649805665016174
Validation loss = 0.02222760021686554
Validation loss = 0.023188268765807152
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02184222638607025
Validation loss = 0.022936364635825157
Validation loss = 0.021673569455742836
Validation loss = 0.021134749054908752
Validation loss = 0.02126430533826351
Validation loss = 0.020950792357325554
Validation loss = 0.0210883766412735
Validation loss = 0.022301150485873222
Validation loss = 0.021272709593176842
Validation loss = 0.021256709471344948
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021484987810254097
Validation loss = 0.020512128248810768
Validation loss = 0.019752418622374535
Validation loss = 0.027812262997031212
Validation loss = 0.01992260105907917
Validation loss = 0.022235050797462463
Validation loss = 0.021387042477726936
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09165687426556991
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09154929577464789
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0914419695193435
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09133489461358314
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0912280701754386
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0911214953271028
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09101516919486581
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09090909090909091
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09080325960419092
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09069767441860466
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09059233449477352
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09048723897911833
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09038238702201622
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09027777777777778
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09017341040462427
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09006928406466513
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08996539792387544
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08986175115207373
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08975834292289989
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0896551724137931
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08955223880597014
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08944954128440367
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08934707903780069
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08924485125858124
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08914285714285715
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00243 |
| Iteration     | 33       |
| MaximumReturn | -0.00165 |
| MinimumReturn | -0.00394 |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020908044651150703
Validation loss = 0.02217200957238674
Validation loss = 0.02299594320356846
Validation loss = 0.022585082799196243
Validation loss = 0.021130437031388283
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02136576734483242
Validation loss = 0.021481331437826157
Validation loss = 0.02247183956205845
Validation loss = 0.02014419063925743
Validation loss = 0.020937802270054817
Validation loss = 0.022063372656702995
Validation loss = 0.023836374282836914
Validation loss = 0.022010251879692078
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021167468279600143
Validation loss = 0.02458583191037178
Validation loss = 0.02203558199107647
Validation loss = 0.021871626377105713
Validation loss = 0.020726079121232033
Validation loss = 0.021875740960240364
Validation loss = 0.02364937588572502
Validation loss = 0.020542031154036522
Validation loss = 0.022378627210855484
Validation loss = 0.02124762535095215
Validation loss = 0.0216322410851717
Validation loss = 0.019889211282134056
Validation loss = 0.02465854212641716
Validation loss = 0.021474571898579597
Validation loss = 0.020268408581614494
Validation loss = 0.022439679130911827
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022811785340309143
Validation loss = 0.02172224223613739
Validation loss = 0.020491646602749825
Validation loss = 0.021084289997816086
Validation loss = 0.02300512045621872
Validation loss = 0.02270292118191719
Validation loss = 0.022605640813708305
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02009689249098301
Validation loss = 0.020139332860708237
Validation loss = 0.021024439483880997
Validation loss = 0.0202725138515234
Validation loss = 0.019401144236326218
Validation loss = 0.020874522626399994
Validation loss = 0.02064271830022335
Validation loss = 0.02110442705452442
Validation loss = 0.01944347284734249
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08904109589041095
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08893956670467502
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0888382687927107
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08873720136518772
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08863636363636364
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08853575482406356
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08843537414965986
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08833522083805209
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08823529411764706
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08813559322033898
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08803611738148984
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08793686583990981
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08783783783783784
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08773903262092239
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08764044943820225
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08754208754208755
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08744394618834081
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08734602463605823
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.087248322147651
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08715083798882682
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08705357142857142
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08695652173913043
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08685968819599109
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0867630700778643
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08666666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000958 |
| Iteration     | 34        |
| MaximumReturn | -0.000723 |
| MinimumReturn | -0.00124  |
| TotalSamples  | 59976     |
-----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021978365257382393
Validation loss = 0.021250860765576363
Validation loss = 0.023567458614706993
Validation loss = 0.021431434899568558
Validation loss = 0.021495293825864792
Validation loss = 0.022778712213039398
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024373900145292282
Validation loss = 0.021922647953033447
Validation loss = 0.02159859798848629
Validation loss = 0.023755574598908424
Validation loss = 0.022243816405534744
Validation loss = 0.021466420963406563
Validation loss = 0.020539283752441406
Validation loss = 0.022196583449840546
Validation loss = 0.021507583558559418
Validation loss = 0.021172283217310905
Validation loss = 0.02160823903977871
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02020672895014286
Validation loss = 0.02143232896924019
Validation loss = 0.024033691734075546
Validation loss = 0.02089686319231987
Validation loss = 0.021758679300546646
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022064823657274246
Validation loss = 0.02191823534667492
Validation loss = 0.021540934219956398
Validation loss = 0.020492589101195335
Validation loss = 0.02198459953069687
Validation loss = 0.02148761786520481
Validation loss = 0.023788509890437126
Validation loss = 0.020650964230298996
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020299237221479416
Validation loss = 0.020459147170186043
Validation loss = 0.020380467176437378
Validation loss = 0.02056342177093029
Validation loss = 0.021971099078655243
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08657047724750278
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08647450110864745
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08637873754152824
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08628318584070796
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0861878453038674
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08609271523178808
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08599779492833518
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08590308370044053
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0858085808580858
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08571428571428572
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08562019758507135
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08552631578947369
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08543263964950712
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08533916849015317
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08524590163934426
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0851528384279476
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08505997818974918
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08496732026143791
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08487486398258977
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08478260869565217
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08469055374592833
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08459869848156182
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08450704225352113
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08441558441558442
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08432432432432432
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00178 |
| Iteration     | 35       |
| MaximumReturn | -0.00125 |
| MinimumReturn | -0.00226 |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02262662537395954
Validation loss = 0.021125612780451775
Validation loss = 0.0220467709004879
Validation loss = 0.023157324641942978
Validation loss = 0.021692341193556786
Validation loss = 0.021007930859923363
Validation loss = 0.022630127146840096
Validation loss = 0.022060712799429893
Validation loss = 0.02195369265973568
Validation loss = 0.02151050604879856
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022327959537506104
Validation loss = 0.022272586822509766
Validation loss = 0.02268054150044918
Validation loss = 0.023090068250894547
Validation loss = 0.02159934490919113
Validation loss = 0.021417981013655663
Validation loss = 0.02247348055243492
Validation loss = 0.020971033722162247
Validation loss = 0.02554573491215706
Validation loss = 0.023435376584529877
Validation loss = 0.02253616973757744
Validation loss = 0.023138945922255516
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022511664777994156
Validation loss = 0.021599283441901207
Validation loss = 0.022010643035173416
Validation loss = 0.023912066593766212
Validation loss = 0.021649571135640144
Validation loss = 0.020529529079794884
Validation loss = 0.021459143608808517
Validation loss = 0.021597957238554955
Validation loss = 0.02274157665669918
Validation loss = 0.021323325112462044
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022776739671826363
Validation loss = 0.022354381158947945
Validation loss = 0.0208247359842062
Validation loss = 0.023199178278446198
Validation loss = 0.02225007861852646
Validation loss = 0.021488986909389496
Validation loss = 0.02331608533859253
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023102713748812675
Validation loss = 0.021874448284506798
Validation loss = 0.021112563088536263
Validation loss = 0.020427605137228966
Validation loss = 0.027285689488053322
Validation loss = 0.020974311977624893
Validation loss = 0.021028559654951096
Validation loss = 0.01981651782989502
Validation loss = 0.02114783599972725
Validation loss = 0.020902065560221672
Validation loss = 0.021275896579027176
Validation loss = 0.02176210656762123
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08423326133909287
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08414239482200647
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08405172413793104
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08396124865446716
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08387096774193549
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08378088077336197
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08369098712446352
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08360128617363344
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0835117773019272
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08342245989304813
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08333333333333333
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0832443970117396
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08315565031982942
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08306709265175719
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08297872340425531
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08289054197662062
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08280254777070063
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08271474019088017
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0826271186440678
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08253968253968254
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0824524312896406
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08236536430834214
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08227848101265822
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0821917808219178
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08210526315789474
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00465 |
| Iteration     | 36       |
| MaximumReturn | -0.00265 |
| MinimumReturn | -0.00686 |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02303173579275608
Validation loss = 0.02385585755109787
Validation loss = 0.023491308093070984
Validation loss = 0.022806867957115173
Validation loss = 0.023252805694937706
Validation loss = 0.021566808223724365
Validation loss = 0.021187901496887207
Validation loss = 0.021741148084402084
Validation loss = 0.024526380002498627
Validation loss = 0.022935854271054268
Validation loss = 0.021418731659650803
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021766021847724915
Validation loss = 0.02100878767669201
Validation loss = 0.020616840571165085
Validation loss = 0.021521005779504776
Validation loss = 0.021261950954794884
Validation loss = 0.02215719223022461
Validation loss = 0.0208586435765028
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02366754598915577
Validation loss = 0.022139880806207657
Validation loss = 0.02367609180510044
Validation loss = 0.02117827720940113
Validation loss = 0.02209381014108658
Validation loss = 0.020676566287875175
Validation loss = 0.02157532051205635
Validation loss = 0.022045521065592766
Validation loss = 0.021183278411626816
Validation loss = 0.02167235128581524
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02143697440624237
Validation loss = 0.02099461294710636
Validation loss = 0.023800823837518692
Validation loss = 0.022032586857676506
Validation loss = 0.02310267463326454
Validation loss = 0.022614290937781334
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02101725898683071
Validation loss = 0.01992945373058319
Validation loss = 0.0212416909635067
Validation loss = 0.021022452041506767
Validation loss = 0.020589105784893036
Validation loss = 0.020786022767424583
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08201892744479496
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0819327731092437
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08184679958027283
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08176100628930817
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08167539267015707
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08158995815899582
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08150470219435736
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.081419624217119
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08133472367049009
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08125
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08116545265348596
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08108108108108109
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08099688473520249
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08091286307053942
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08082901554404145
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08074534161490683
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08066184074457083
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08057851239669421
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0804953560371517
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08041237113402062
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08032955715756952
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08024691358024691
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08016443987667009
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08008213552361396
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00186 |
| Iteration     | 37       |
| MaximumReturn | -0.00139 |
| MinimumReturn | -0.00263 |
| TotalSamples  | 64974    |
----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02231445163488388
Validation loss = 0.022046465426683426
Validation loss = 0.02523057535290718
Validation loss = 0.01997530274093151
Validation loss = 0.023022688925266266
Validation loss = 0.021611612290143967
Validation loss = 0.02122379280626774
Validation loss = 0.02091730572283268
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020769041031599045
Validation loss = 0.022634435445070267
Validation loss = 0.023677561432123184
Validation loss = 0.02243862673640251
Validation loss = 0.020671145990490913
Validation loss = 0.020580196753144264
Validation loss = 0.022026346996426582
Validation loss = 0.0240030474960804
Validation loss = 0.020811954513192177
Validation loss = 0.021602045744657516
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022138752043247223
Validation loss = 0.020972058176994324
Validation loss = 0.021156717091798782
Validation loss = 0.023303378373384476
Validation loss = 0.022153962403535843
Validation loss = 0.022560834884643555
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02223476767539978
Validation loss = 0.021781250834465027
Validation loss = 0.02232779935002327
Validation loss = 0.02205093950033188
Validation loss = 0.022090446203947067
Validation loss = 0.022198881953954697
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022840917110443115
Validation loss = 0.020685363560914993
Validation loss = 0.020664509385824203
Validation loss = 0.021491914987564087
Validation loss = 0.020473042502999306
Validation loss = 0.02450738102197647
Validation loss = 0.02006601169705391
Validation loss = 0.02083984762430191
Validation loss = 0.020937001332640648
Validation loss = 0.02270267903804779
Validation loss = 0.021518774330615997
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07991803278688525
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07983623336745138
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.08077709611451943
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08069458631256383
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.08163265306122448
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.08256880733944955
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.0845213849287169
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.08646998982706001
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08638211382113821
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08629441624365482
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.0872210953346856
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.08814589665653495
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08805668016194332
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.0910010111223458
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09191919191919191
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.09485368314833502
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.0967741935483871
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09768378650553877
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09758551307847083
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.09949748743718594
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09939759036144578
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.10130391173520562
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10120240480961924
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.1041041041041041
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.104
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.76    |
| Iteration     | 38       |
| MaximumReturn | -0.0505  |
| MinimumReturn | -50.2    |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021561814472079277
Validation loss = 0.023188391700387
Validation loss = 0.021622994914650917
Validation loss = 0.022057954221963882
Validation loss = 0.021813509985804558
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02156119793653488
Validation loss = 0.02132423408329487
Validation loss = 0.02010917291045189
Validation loss = 0.022502336651086807
Validation loss = 0.020704561844468117
Validation loss = 0.020897790789604187
Validation loss = 0.022639689967036247
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02231048233807087
Validation loss = 0.02115844376385212
Validation loss = 0.025148781016469002
Validation loss = 0.019820962101221085
Validation loss = 0.021175192669034004
Validation loss = 0.02152526006102562
Validation loss = 0.020378923043608665
Validation loss = 0.020599065348505974
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02058194763958454
Validation loss = 0.021680066362023354
Validation loss = 0.021664094179868698
Validation loss = 0.021038750186562538
Validation loss = 0.020705468952655792
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021741410717368126
Validation loss = 0.02102472074329853
Validation loss = 0.02293015643954277
Validation loss = 0.019296610727906227
Validation loss = 0.020709529519081116
Validation loss = 0.022910429164767265
Validation loss = 0.021204721182584763
Validation loss = 0.022062331438064575
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1038961038961039
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10379241516966067
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1036889332003988
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10358565737051793
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10348258706467661
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10337972166998012
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10327706057596822
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10317460317460317
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10307234886025768
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10297029702970296
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10286844708209693
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10276679841897234
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10266535044422508
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10256410256410256
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10246305418719212
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10236220472440945
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10226155358898721
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10216110019646366
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10206084396467124
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.10294117647058823
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10284035259549461
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10273972602739725
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10263929618768329
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1025390625
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1024390243902439
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.83    |
| Iteration     | 39       |
| MaximumReturn | -0.0381  |
| MinimumReturn | -40.4    |
| TotalSamples  | 68306    |
----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023869413882493973
Validation loss = 0.022032296285033226
Validation loss = 0.02397671900689602
Validation loss = 0.024559088051319122
Validation loss = 0.02277490310370922
Validation loss = 0.020999109372496605
Validation loss = 0.024822471663355827
Validation loss = 0.025918180122971535
Validation loss = 0.022395389154553413
Validation loss = 0.02334197238087654
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02189052477478981
Validation loss = 0.023012561723589897
Validation loss = 0.02355404756963253
Validation loss = 0.022902941331267357
Validation loss = 0.02228829264640808
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021853221580386162
Validation loss = 0.02264401502907276
Validation loss = 0.0239467304199934
Validation loss = 0.021985994651913643
Validation loss = 0.02215910702943802
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02353322133421898
Validation loss = 0.021406646817922592
Validation loss = 0.022114263847470284
Validation loss = 0.021123260259628296
Validation loss = 0.02219625562429428
Validation loss = 0.020490210503339767
Validation loss = 0.021592073142528534
Validation loss = 0.022016307339072227
Validation loss = 0.023982727900147438
Validation loss = 0.022347411140799522
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021778272464871407
Validation loss = 0.02158503234386444
Validation loss = 0.021040616557002068
Validation loss = 0.021928654983639717
Validation loss = 0.02560407668352127
Validation loss = 0.021335123106837273
Validation loss = 0.021777639165520668
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1023391812865497
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10223953261927946
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10214007782101167
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10204081632653061
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10194174757281553
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10184287099903007
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10174418604651163
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10164569215876089
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10154738878143134
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10144927536231885
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10135135135135136
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10125361620057859
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10115606936416185
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10105871029836382
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10096153846153846
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10086455331412104
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10076775431861804
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10067114093959731
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10057471264367816
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10047846889952153
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10038240917782026
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10028653295128939
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10019083969465649
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10009532888465204
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00273 |
| Iteration     | 40       |
| MaximumReturn | -0.00197 |
| MinimumReturn | -0.00381 |
| TotalSamples  | 69972    |
----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02278684824705124
Validation loss = 0.023078368976712227
Validation loss = 0.021949050948023796
Validation loss = 0.022180795669555664
Validation loss = 0.025457583367824554
Validation loss = 0.022809721529483795
Validation loss = 0.023189082741737366
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022309165447950363
Validation loss = 0.022334657609462738
Validation loss = 0.023922149091959
Validation loss = 0.02439933642745018
Validation loss = 0.022658079862594604
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022410660982131958
Validation loss = 0.02212459221482277
Validation loss = 0.023145662620663643
Validation loss = 0.023913314566016197
Validation loss = 0.02376883663237095
Validation loss = 0.02336113713681698
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0231469813734293
Validation loss = 0.0223262757062912
Validation loss = 0.025071533396840096
Validation loss = 0.021945081651210785
Validation loss = 0.022464876994490623
Validation loss = 0.021908508613705635
Validation loss = 0.02362496592104435
Validation loss = 0.023688271641731262
Validation loss = 0.022739995270967484
Validation loss = 0.021754350513219833
Validation loss = 0.02194065786898136
Validation loss = 0.022796155884861946
Validation loss = 0.022251224145293236
Validation loss = 0.022086510434746742
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021395718678832054
Validation loss = 0.023621026426553726
Validation loss = 0.02146790362894535
Validation loss = 0.021218316629529
Validation loss = 0.02143310382962227
Validation loss = 0.02220848575234413
Validation loss = 0.022561369463801384
Validation loss = 0.021170100197196007
Validation loss = 0.021223092451691628
Validation loss = 0.021298153325915337
Validation loss = 0.020974528044462204
Validation loss = 0.020899934694170952
Validation loss = 0.022517863661050797
Validation loss = 0.022499119862914085
Validation loss = 0.02019035816192627
Validation loss = 0.02117910236120224
Validation loss = 0.02144310437142849
Validation loss = 0.022273192182183266
Validation loss = 0.02333979867398739
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09990485252140818
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09980988593155894
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09971509971509972
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09962049335863378
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0995260663507109
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09943181818181818
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09933774834437085
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09924385633270322
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09915014164305949
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09905660377358491
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09896324222431668
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09887005649717515
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09877704609595485
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09868421052631579
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09859154929577464
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09849906191369606
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09840674789128398
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09831460674157304
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09822263797942002
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09813084112149532
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09803921568627451
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09794776119402986
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.097856477166822
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09776536312849161
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09767441860465116
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00158  |
| Iteration     | 41        |
| MaximumReturn | -0.000951 |
| MinimumReturn | -0.0025   |
| TotalSamples  | 71638     |
-----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023726705461740494
Validation loss = 0.023251838982105255
Validation loss = 0.02337818406522274
Validation loss = 0.023158827796578407
Validation loss = 0.024649253115057945
Validation loss = 0.024842560291290283
Validation loss = 0.02536321058869362
Validation loss = 0.02250075712800026
Validation loss = 0.023244773969054222
Validation loss = 0.02498333342373371
Validation loss = 0.02257332019507885
Validation loss = 0.024609528481960297
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02355853281915188
Validation loss = 0.02267988584935665
Validation loss = 0.022179480642080307
Validation loss = 0.022960543632507324
Validation loss = 0.024362988770008087
Validation loss = 0.021887008100748062
Validation loss = 0.022573096677660942
Validation loss = 0.02360537089407444
Validation loss = 0.022626137360930443
Validation loss = 0.022499999031424522
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022612260654568672
Validation loss = 0.021216347813606262
Validation loss = 0.024163393303751945
Validation loss = 0.02323734015226364
Validation loss = 0.022708408534526825
Validation loss = 0.022071177139878273
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0256828423589468
Validation loss = 0.022123664617538452
Validation loss = 0.02334277704358101
Validation loss = 0.024316992610692978
Validation loss = 0.024348650127649307
Validation loss = 0.02463148906826973
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02604125812649727
Validation loss = 0.02334490977227688
Validation loss = 0.022404467687010765
Validation loss = 0.02246377244591713
Validation loss = 0.022161489352583885
Validation loss = 0.021962741389870644
Validation loss = 0.021524703130126
Validation loss = 0.023782040923833847
Validation loss = 0.022859761491417885
Validation loss = 0.021907683461904526
Validation loss = 0.022896399721503258
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09758364312267657
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09749303621169916
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09740259740259741
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09731232622798888
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09722222222222222
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0971322849213691
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09704251386321626
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09695290858725762
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09686346863468635
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0967741935483871
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09668508287292818
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09659613615455381
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09650735294117647
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09641873278236915
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0963302752293578
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09624197983501374
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09615384615384616
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09606587374199452
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09597806215722121
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0958904109589041
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09580291970802919
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09571558796718323
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09562841530054644
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09554140127388536
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09545454545454546
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00153 |
| Iteration     | 42       |
| MaximumReturn | -0.00105 |
| MinimumReturn | -0.00252 |
| TotalSamples  | 73304    |
----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022804472595453262
Validation loss = 0.022578997537493706
Validation loss = 0.02326415851712227
Validation loss = 0.02283846214413643
Validation loss = 0.023431111127138138
Validation loss = 0.023806259036064148
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02261476218700409
Validation loss = 0.022868139669299126
Validation loss = 0.023383770138025284
Validation loss = 0.02341863326728344
Validation loss = 0.023352833464741707
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.030863650143146515
Validation loss = 0.023934362456202507
Validation loss = 0.023278985172510147
Validation loss = 0.02149241976439953
Validation loss = 0.022014392539858818
Validation loss = 0.02427772246301174
Validation loss = 0.0234459787607193
Validation loss = 0.02231156826019287
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025008244439959526
Validation loss = 0.024505741894245148
Validation loss = 0.022863101214170456
Validation loss = 0.021674875169992447
Validation loss = 0.02427373267710209
Validation loss = 0.024749746546149254
Validation loss = 0.0226749237626791
Validation loss = 0.022752266377210617
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02207173965871334
Validation loss = 0.021519877016544342
Validation loss = 0.022379674017429352
Validation loss = 0.022525308653712273
Validation loss = 0.025728436186909676
Validation loss = 0.022425832226872444
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09536784741144415
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09528130671506352
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09519492293744333
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09510869565217392
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09502262443438914
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0949367088607595
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0948509485094851
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09476534296028881
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09467989179440937
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0945945945945946
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09450945094509451
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09442446043165467
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09433962264150944
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09425493716337523
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09417040358744394
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09408602150537634
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09400179051029543
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09391771019677997
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0938337801608579
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09375
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0936663693131133
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09358288770053476
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09349955476402494
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09341637010676157
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09333333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00108  |
| Iteration     | 43        |
| MaximumReturn | -0.000797 |
| MinimumReturn | -0.00172  |
| TotalSamples  | 74970     |
-----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024069227278232574
Validation loss = 0.02375948242843151
Validation loss = 0.024283459410071373
Validation loss = 0.022136986255645752
Validation loss = 0.02458515763282776
Validation loss = 0.023580292239785194
Validation loss = 0.02331729792058468
Validation loss = 0.024738071486353874
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023571141064167023
Validation loss = 0.022268755361437798
Validation loss = 0.02372504211962223
Validation loss = 0.02275756187736988
Validation loss = 0.022962406277656555
Validation loss = 0.02471393160521984
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021537933498620987
Validation loss = 0.022184181958436966
Validation loss = 0.021751251071691513
Validation loss = 0.024541964754462242
Validation loss = 0.022217025980353355
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02193617820739746
Validation loss = 0.023131543770432472
Validation loss = 0.023093894124031067
Validation loss = 0.02290843240916729
Validation loss = 0.022610491141676903
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02283054031431675
Validation loss = 0.02248920314013958
Validation loss = 0.02104991115629673
Validation loss = 0.02257157862186432
Validation loss = 0.02265472337603569
Validation loss = 0.023930205032229424
Validation loss = 0.02266440913081169
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09325044404973357
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09316770186335403
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09308510638297872
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09300265721877768
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09292035398230089
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09283819628647215
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09275618374558305
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09267431597528684
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09259259259259259
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09251101321585903
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09242957746478873
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09234828496042216
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09226713532513181
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09218612818261633
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09210526315789473
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09202453987730061
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09194395796847636
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09186351706036745
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09178321678321678
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09170305676855896
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09162303664921466
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09154315605928509
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09146341463414634
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09138381201044386
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09130434782608696
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00751 |
| Iteration     | 44       |
| MaximumReturn | -0.00516 |
| MinimumReturn | -0.00944 |
| TotalSamples  | 76636    |
----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02305903285741806
Validation loss = 0.023029860109090805
Validation loss = 0.02202034555375576
Validation loss = 0.022205905988812447
Validation loss = 0.02283317968249321
Validation loss = 0.022444769740104675
Validation loss = 0.02400742471218109
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022104537114501
Validation loss = 0.023078763857483864
Validation loss = 0.02390129491686821
Validation loss = 0.02193191833794117
Validation loss = 0.022844389081001282
Validation loss = 0.022283226251602173
Validation loss = 0.022601066157221794
Validation loss = 0.02361007034778595
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02272622101008892
Validation loss = 0.023730266839265823
Validation loss = 0.021240927278995514
Validation loss = 0.02178874798119068
Validation loss = 0.022081643342971802
Validation loss = 0.02168649062514305
Validation loss = 0.022188719362020493
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024111682549118996
Validation loss = 0.02280327118933201
Validation loss = 0.02311275526881218
Validation loss = 0.02363727055490017
Validation loss = 0.02545766532421112
Validation loss = 0.023151403293013573
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022517511621117592
Validation loss = 0.02273865044116974
Validation loss = 0.022328665480017662
Validation loss = 0.021201495081186295
Validation loss = 0.02140374481678009
Validation loss = 0.02148996666073799
Validation loss = 0.022041629999876022
Validation loss = 0.02061150036752224
Validation loss = 0.021440809592604637
Validation loss = 0.021228039637207985
Validation loss = 0.022534430027008057
Validation loss = 0.022330481559038162
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09122502172024327
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09114583333333333
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09106678230702515
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09098786828422877
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09090909090909091
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09083044982698962
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09075194468452895
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09067357512953368
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.090595340811044
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09051724137931035
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09043927648578812
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09036144578313253
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09028374892519346
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09020618556701031
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09012875536480687
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09005145797598628
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08997429305912596
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0898972602739726
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08982035928143713
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08974358974358974
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.089666951323655
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08959044368600683
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08951406649616368
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08943781942078365
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08936170212765958
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00268 |
| Iteration     | 45       |
| MaximumReturn | -0.00172 |
| MinimumReturn | -0.00395 |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021710535511374474
Validation loss = 0.022985266521573067
Validation loss = 0.022925565019249916
Validation loss = 0.022860979661345482
Validation loss = 0.021980570629239082
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024138258770108223
Validation loss = 0.0214474406093359
Validation loss = 0.02145507000386715
Validation loss = 0.021758435294032097
Validation loss = 0.021639617159962654
Validation loss = 0.021963421255350113
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022552791982889175
Validation loss = 0.022286124527454376
Validation loss = 0.02157100848853588
Validation loss = 0.022824937477707863
Validation loss = 0.021725216880440712
Validation loss = 0.02204176038503647
Validation loss = 0.02221996709704399
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02478780224919319
Validation loss = 0.02190358005464077
Validation loss = 0.023030929267406464
Validation loss = 0.022580644115805626
Validation loss = 0.022184228524565697
Validation loss = 0.023080814629793167
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022389352321624756
Validation loss = 0.02246929332613945
Validation loss = 0.020539112389087677
Validation loss = 0.02073078602552414
Validation loss = 0.022302230820059776
Validation loss = 0.02110634744167328
Validation loss = 0.021534185856580734
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08928571428571429
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08920985556499575
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08913412563667232
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.089058524173028
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08898305084745763
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08890770533446232
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08883248730964467
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08875739644970414
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08868243243243243
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08860759493670886
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08853288364249579
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08845829823083404
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08838383838383838
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08830950378469302
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08823529411764706
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08816120906801007
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08808724832214765
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08801341156747695
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08793969849246232
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08786610878661087
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08779264214046822
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08771929824561403
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08764607679465776
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08757297748123437
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0875
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00253 |
| Iteration     | 46       |
| MaximumReturn | -0.00167 |
| MinimumReturn | -0.00317 |
| TotalSamples  | 79968    |
----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02104276232421398
Validation loss = 0.02206750400364399
Validation loss = 0.023315176367759705
Validation loss = 0.022184770554304123
Validation loss = 0.023640098050236702
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023079583421349525
Validation loss = 0.021916629746556282
Validation loss = 0.020885061472654343
Validation loss = 0.022715505212545395
Validation loss = 0.022658945992588997
Validation loss = 0.022755170240998268
Validation loss = 0.02232883684337139
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02471996285021305
Validation loss = 0.022310029715299606
Validation loss = 0.020936502143740654
Validation loss = 0.021730167791247368
Validation loss = 0.022757310420274734
Validation loss = 0.021108631044626236
Validation loss = 0.021297570317983627
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.027687937021255493
Validation loss = 0.024932900443673134
Validation loss = 0.022929277271032333
Validation loss = 0.024163128808140755
Validation loss = 0.02249753847718239
Validation loss = 0.021870529279112816
Validation loss = 0.02214527688920498
Validation loss = 0.022932415828108788
Validation loss = 0.022680532187223434
Validation loss = 0.023786306381225586
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02254604734480381
Validation loss = 0.02159651182591915
Validation loss = 0.020875221118330956
Validation loss = 0.021390924230217934
Validation loss = 0.021101191639900208
Validation loss = 0.02342984639108181
Validation loss = 0.021990124136209488
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08742714404662781
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08735440931780367
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08728179551122195
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0872093023255814
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08713692946058091
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08706467661691543
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08699254349627175
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0869205298013245
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08684863523573201
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08677685950413223
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08670520231213873
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08663366336633663
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08656224237427865
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08649093904448106
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08641975308641975
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08634868421052631
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08627773212818406
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08620689655172414
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08613617719442165
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0860655737704918
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.085995085995086
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08592471358428805
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08585445625511039
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0857843137254902
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08571428571428572
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00097  |
| Iteration     | 47        |
| MaximumReturn | -0.000681 |
| MinimumReturn | -0.00129  |
| TotalSamples  | 81634     |
-----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02319920063018799
Validation loss = 0.023189377039670944
Validation loss = 0.022099871188402176
Validation loss = 0.02212120220065117
Validation loss = 0.022830266505479813
Validation loss = 0.022975703701376915
Validation loss = 0.02180827036499977
Validation loss = 0.022217411547899246
Validation loss = 0.023747747763991356
Validation loss = 0.023280981928110123
Validation loss = 0.021507028490304947
Validation loss = 0.022547811269760132
Validation loss = 0.02282654494047165
Validation loss = 0.02219470962882042
Validation loss = 0.02456308715045452
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022311553359031677
Validation loss = 0.021989885717630386
Validation loss = 0.022405235096812248
Validation loss = 0.022771751508116722
Validation loss = 0.022537056356668472
Validation loss = 0.02245774306356907
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02282417006790638
Validation loss = 0.02223862335085869
Validation loss = 0.024401728063821793
Validation loss = 0.021832238882780075
Validation loss = 0.021666008979082108
Validation loss = 0.02122533693909645
Validation loss = 0.02244764193892479
Validation loss = 0.023076703771948814
Validation loss = 0.022109637036919594
Validation loss = 0.022247474640607834
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024799222126603127
Validation loss = 0.025136500597000122
Validation loss = 0.02336362563073635
Validation loss = 0.02419486828148365
Validation loss = 0.02314993366599083
Validation loss = 0.023591022938489914
Validation loss = 0.0230676531791687
Validation loss = 0.022309493273496628
Validation loss = 0.02273842692375183
Validation loss = 0.021761754527688026
Validation loss = 0.022643934935331345
Validation loss = 0.023374345153570175
Validation loss = 0.02241651341319084
Validation loss = 0.02586464211344719
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023750539869070053
Validation loss = 0.021855520084500313
Validation loss = 0.021138066425919533
Validation loss = 0.02290981076657772
Validation loss = 0.022045811638236046
Validation loss = 0.023753156885504723
Validation loss = 0.023087847977876663
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08564437194127243
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08557457212713937
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08550488599348534
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0854353132628153
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08536585365853659
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08529650690495533
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08522727272727272
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0851581508515815
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08508914100486224
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08502024291497975
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08495145631067962
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08488278092158448
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08481421647819062
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0847457627118644
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0846774193548387
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08460918614020951
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08454106280193237
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08447304907481899
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08440514469453377
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08433734939759036
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08426966292134831
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08420208500400962
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08413461538461539
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08406725380304243
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.084
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000986 |
| Iteration     | 48        |
| MaximumReturn | -0.000741 |
| MinimumReturn | -0.00149  |
| TotalSamples  | 83300     |
-----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02585940808057785
Validation loss = 0.023618660867214203
Validation loss = 0.022887716069817543
Validation loss = 0.02193286083638668
Validation loss = 0.021872540935873985
Validation loss = 0.02350829727947712
Validation loss = 0.023528248071670532
Validation loss = 0.023374585434794426
Validation loss = 0.022608568891882896
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023033585399389267
Validation loss = 0.021429309621453285
Validation loss = 0.02268519438803196
Validation loss = 0.021970760077238083
Validation loss = 0.02406487613916397
Validation loss = 0.023335836827754974
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02185676246881485
Validation loss = 0.021802369505167007
Validation loss = 0.02257464826107025
Validation loss = 0.021404603496193886
Validation loss = 0.022774089127779007
Validation loss = 0.021657144650816917
Validation loss = 0.022600144147872925
Validation loss = 0.022263403981924057
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024490274488925934
Validation loss = 0.022572536021471024
Validation loss = 0.02200610563158989
Validation loss = 0.022081807255744934
Validation loss = 0.022176288068294525
Validation loss = 0.023718571290373802
Validation loss = 0.023310165852308273
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021151559427380562
Validation loss = 0.022015932947397232
Validation loss = 0.022259773686528206
Validation loss = 0.0232627522200346
Validation loss = 0.022177640348672867
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08393285371702638
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08386581469648563
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08379888268156424
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08373205741626795
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08366533864541832
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08359872611464968
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08353221957040573
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0834658187599364
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08339952343129468
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08333333333333333
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08326724821570182
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0832012678288431
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0831353919239905
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08306962025316456
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08300395256916997
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08293838862559241
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08287292817679558
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08280757097791798
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08274231678486997
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08267716535433071
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08261211644374508
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08254716981132075
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08248232521602514
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08241758241758242
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08235294117647059
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00099  |
| Iteration     | 49        |
| MaximumReturn | -0.000709 |
| MinimumReturn | -0.00135  |
| TotalSamples  | 84966     |
-----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023259237408638
Validation loss = 0.025315674021840096
Validation loss = 0.024131987243890762
Validation loss = 0.02298695035278797
Validation loss = 0.023011520504951477
Validation loss = 0.022573530673980713
Validation loss = 0.022749414667487144
Validation loss = 0.02275759167969227
Validation loss = 0.022697431966662407
Validation loss = 0.0227418951690197
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023323621600866318
Validation loss = 0.024583637714385986
Validation loss = 0.021139688789844513
Validation loss = 0.027389075607061386
Validation loss = 0.02398480474948883
Validation loss = 0.022529112175107002
Validation loss = 0.024592243134975433
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024385835975408554
Validation loss = 0.02691488340497017
Validation loss = 0.02171419933438301
Validation loss = 0.0228596068918705
Validation loss = 0.024509994313120842
Validation loss = 0.022693315520882607
Validation loss = 0.022472359240055084
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022363269701600075
Validation loss = 0.02243204228579998
Validation loss = 0.024582695215940475
Validation loss = 0.022879716008901596
Validation loss = 0.02122153341770172
Validation loss = 0.022163905203342438
Validation loss = 0.0244135819375515
Validation loss = 0.022077340632677078
Validation loss = 0.02411683090031147
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021668341010808945
Validation loss = 0.022584185004234314
Validation loss = 0.0218979399651289
Validation loss = 0.022663971409201622
Validation loss = 0.020880958065390587
Validation loss = 0.02314685843884945
Validation loss = 0.021615000441670418
Validation loss = 0.020809637382626534
Validation loss = 0.021741513162851334
Validation loss = 0.022122154012322426
Validation loss = 0.02162928879261017
Validation loss = 0.022588692605495453
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0822884012539185
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0822239624119029
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08215962441314555
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08209538702111024
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08203125
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08196721311475409
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08190327613104524
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0818394388152767
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08177570093457943
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08171206225680934
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08164852255054432
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08158508158508158
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08152173913043478
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08145849495733126
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08139534883720931
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08133230054221534
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08126934984520123
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08120649651972157
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08114374034003091
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08108108108108109
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08101851851851852
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08095605242868158
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08089368258859785
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08083140877598152
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08076923076923077
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000987 |
| Iteration     | 50        |
| MaximumReturn | -0.000574 |
| MinimumReturn | -0.00201  |
| TotalSamples  | 86632     |
-----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02235531061887741
Validation loss = 0.02366696111857891
Validation loss = 0.023511666804552078
Validation loss = 0.02239135466516018
Validation loss = 0.024782903492450714
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022871388122439384
Validation loss = 0.022082529962062836
Validation loss = 0.022141478955745697
Validation loss = 0.022527411580085754
Validation loss = 0.023740552365779877
Validation loss = 0.02113528735935688
Validation loss = 0.02280295640230179
Validation loss = 0.022642508149147034
Validation loss = 0.022284861654043198
Validation loss = 0.023207303136587143
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022167973220348358
Validation loss = 0.022527463734149933
Validation loss = 0.022943992167711258
Validation loss = 0.020746352151036263
Validation loss = 0.02201767824590206
Validation loss = 0.021001026034355164
Validation loss = 0.02110959030687809
Validation loss = 0.022701900452375412
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023833349347114563
Validation loss = 0.02253321185708046
Validation loss = 0.022705551236867905
Validation loss = 0.022015724331140518
Validation loss = 0.024424105882644653
Validation loss = 0.02466489188373089
Validation loss = 0.023979464545845985
Validation loss = 0.02157360129058361
Validation loss = 0.023616772145032883
Validation loss = 0.02187291346490383
Validation loss = 0.021900039166212082
Validation loss = 0.022963011637330055
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024642610922455788
Validation loss = 0.021153949201107025
Validation loss = 0.02327265776693821
Validation loss = 0.02239397168159485
Validation loss = 0.023320389911532402
Validation loss = 0.021747412160038948
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08070714834742505
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08064516129032258
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08058326937835764
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08052147239263803
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08045977011494253
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08039816232771822
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08033664881407804
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08027522935779817
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08021390374331551
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08015267175572519
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08009153318077804
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08003048780487805
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07996953541507996
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07990867579908675
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07984790874524715
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0797872340425532
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07972665148063782
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07966616084977238
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07960576194086429
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07954545454545454
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07948523845571537
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0794251134644478
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07936507936507936
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07930513595166164
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07924528301886792
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00592 |
| Iteration     | 51       |
| MaximumReturn | -0.0035  |
| MinimumReturn | -0.00817 |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023661678656935692
Validation loss = 0.021770531311631203
Validation loss = 0.023574311286211014
Validation loss = 0.02108708769083023
Validation loss = 0.02288077026605606
Validation loss = 0.022280879318714142
Validation loss = 0.02217766083776951
Validation loss = 0.021780140697956085
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021109027788043022
Validation loss = 0.022708408534526825
Validation loss = 0.022089194506406784
Validation loss = 0.022692715749144554
Validation loss = 0.0239456407725811
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024696560576558113
Validation loss = 0.023332692682743073
Validation loss = 0.020876290276646614
Validation loss = 0.020825136452913284
Validation loss = 0.025706857442855835
Validation loss = 0.022455858066678047
Validation loss = 0.023115284740924835
Validation loss = 0.021402882412075996
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02315393090248108
Validation loss = 0.022300658747553825
Validation loss = 0.022801095619797707
Validation loss = 0.022040219977498055
Validation loss = 0.021143360063433647
Validation loss = 0.021706687286496162
Validation loss = 0.02172517031431198
Validation loss = 0.022395329549908638
Validation loss = 0.022110236808657646
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02132246643304825
Validation loss = 0.021189384162425995
Validation loss = 0.021914197131991386
Validation loss = 0.021703897044062614
Validation loss = 0.023292357102036476
Validation loss = 0.02043825574219227
Validation loss = 0.022304052487015724
Validation loss = 0.021023012697696686
Validation loss = 0.022069109603762627
Validation loss = 0.021108930930495262
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07918552036199095
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07912584777694047
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07906626506024096
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07900677200902935
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07894736842105263
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07888805409466566
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07882882882882883
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07876969242310577
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07871064467766117
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07865168539325842
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07859281437125748
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07853403141361257
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07847533632286996
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07841672890216579
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07835820895522388
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07829977628635347
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07824143070044709
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0781831720029784
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.078125
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07806691449814127
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07800891530460624
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0779510022271715
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07789317507418397
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07783543365455893
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07777777777777778
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00286 |
| Iteration     | 52       |
| MaximumReturn | -0.00216 |
| MinimumReturn | -0.00401 |
| TotalSamples  | 89964    |
----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024036355316638947
Validation loss = 0.021566301584243774
Validation loss = 0.022479206323623657
Validation loss = 0.022518755868077278
Validation loss = 0.0213952474296093
Validation loss = 0.022022001445293427
Validation loss = 0.0210653617978096
Validation loss = 0.021632926538586617
Validation loss = 0.022014781832695007
Validation loss = 0.022448992356657982
Validation loss = 0.021893715485930443
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023153111338615417
Validation loss = 0.023095864802598953
Validation loss = 0.021502681076526642
Validation loss = 0.02218448743224144
Validation loss = 0.022012053057551384
Validation loss = 0.023577848449349403
Validation loss = 0.023594528436660767
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02108321152627468
Validation loss = 0.022298097610473633
Validation loss = 0.02216200903058052
Validation loss = 0.021536031737923622
Validation loss = 0.0210301224142313
Validation loss = 0.02219410613179207
Validation loss = 0.02301890030503273
Validation loss = 0.021952662616968155
Validation loss = 0.021773796528577805
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021875502541661263
Validation loss = 0.022647053003311157
Validation loss = 0.021744202822446823
Validation loss = 0.022662507370114326
Validation loss = 0.021727902814745903
Validation loss = 0.023186201229691505
Validation loss = 0.021538494154810905
Validation loss = 0.022869648411870003
Validation loss = 0.021154213696718216
Validation loss = 0.022254204377532005
Validation loss = 0.020493609830737114
Validation loss = 0.023593494668602943
Validation loss = 0.02746697887778282
Validation loss = 0.02308657392859459
Validation loss = 0.023129411041736603
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02269778773188591
Validation loss = 0.02101927623152733
Validation loss = 0.021744197234511375
Validation loss = 0.021989671513438225
Validation loss = 0.021378815174102783
Validation loss = 0.021712418645620346
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07772020725388601
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07766272189349112
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07760532150776053
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0775480059084195
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07749077490774908
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07743362831858407
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07737656595431099
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07731958762886598
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0772626931567329
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07720588235294118
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07714915503306392
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07709251101321586
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07703595011005136
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07697947214076246
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07692307692307693
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07686676427525622
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07681053401609364
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07675438596491228
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07669831994156319
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07664233576642336
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07658643326039387
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07653061224489796
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0764748725418791
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07641921397379912
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07636363636363637
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00107  |
| Iteration     | 53        |
| MaximumReturn | -0.000677 |
| MinimumReturn | -0.00153  |
| TotalSamples  | 91630     |
-----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02226090617477894
Validation loss = 0.02487988956272602
Validation loss = 0.02338961698114872
Validation loss = 0.023858321830630302
Validation loss = 0.022556519135832787
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02283715456724167
Validation loss = 0.0231253020465374
Validation loss = 0.02313159592449665
Validation loss = 0.022534452378749847
Validation loss = 0.02226223424077034
Validation loss = 0.021384824067354202
Validation loss = 0.021353496238589287
Validation loss = 0.022672142833471298
Validation loss = 0.022230114787817
Validation loss = 0.02194758877158165
Validation loss = 0.022351635619997978
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021563895046710968
Validation loss = 0.021550465375185013
Validation loss = 0.02147989720106125
Validation loss = 0.02215931937098503
Validation loss = 0.021047690883278847
Validation loss = 0.02454216219484806
Validation loss = 0.02180621586740017
Validation loss = 0.020590517669916153
Validation loss = 0.02150752767920494
Validation loss = 0.021307654678821564
Validation loss = 0.022654633969068527
Validation loss = 0.02345033548772335
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024111401289701462
Validation loss = 0.022758977487683296
Validation loss = 0.021991940215229988
Validation loss = 0.024521393701434135
Validation loss = 0.022736459970474243
Validation loss = 0.02154434472322464
Validation loss = 0.023456944152712822
Validation loss = 0.02376357465982437
Validation loss = 0.022367002442479134
Validation loss = 0.02355962060391903
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022638510912656784
Validation loss = 0.021233120933175087
Validation loss = 0.023141885176301003
Validation loss = 0.021126776933670044
Validation loss = 0.022029470652341843
Validation loss = 0.022906523197889328
Validation loss = 0.021901285275816917
Validation loss = 0.021884243935346603
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07630813953488372
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07625272331154684
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07619738751814223
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07614213197969544
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07608695652173914
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07603186097031137
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07597684515195369
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07592190889370933
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07586705202312138
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07581227436823104
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07575757575757576
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07570295602018745
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07564841498559077
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0755939524838013
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07553956834532374
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07548526240115025
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07543103448275862
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07537688442211055
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07532281205164992
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07526881720430108
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07521489971346705
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07516105941302792
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07510729613733906
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07505360972122944
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.075
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00373 |
| Iteration     | 54       |
| MaximumReturn | -0.00272 |
| MinimumReturn | -0.00517 |
| TotalSamples  | 93296    |
----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020474590361118317
Validation loss = 0.02369510941207409
Validation loss = 0.022198287770152092
Validation loss = 0.02143804170191288
Validation loss = 0.023272883147001266
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022210782393813133
Validation loss = 0.022372901439666748
Validation loss = 0.021657226607203484
Validation loss = 0.02292175404727459
Validation loss = 0.022741170600056648
Validation loss = 0.021807344630360603
Validation loss = 0.021251775324344635
Validation loss = 0.02270049974322319
Validation loss = 0.021969107910990715
Validation loss = 0.02147621475160122
Validation loss = 0.02191256731748581
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02393346279859543
Validation loss = 0.022641031071543694
Validation loss = 0.022631917148828506
Validation loss = 0.021471668034791946
Validation loss = 0.02277456410229206
Validation loss = 0.021589340642094612
Validation loss = 0.020781781524419785
Validation loss = 0.020682549104094505
Validation loss = 0.022218765690922737
Validation loss = 0.021457353606820107
Validation loss = 0.021509014070034027
Validation loss = 0.025177132338285446
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02258945256471634
Validation loss = 0.025534892454743385
Validation loss = 0.02229580469429493
Validation loss = 0.022090241312980652
Validation loss = 0.02162233553826809
Validation loss = 0.02713313139975071
Validation loss = 0.02250007912516594
Validation loss = 0.02281327173113823
Validation loss = 0.02263820730149746
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021626876667141914
Validation loss = 0.02358236350119114
Validation loss = 0.021755695343017578
Validation loss = 0.02178591676056385
Validation loss = 0.020288240164518356
Validation loss = 0.02204572968184948
Validation loss = 0.0219727773219347
Validation loss = 0.021341821178793907
Validation loss = 0.022821716964244843
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07494646680942184
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07489300998573467
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07483962936564505
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07478632478632478
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07473309608540925
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07467994310099574
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07462686567164178
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07457386363636363
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0745209368346345
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07446808510638298
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0744153082919915
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07436260623229461
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07430997876857749
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07425742574257425
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07420494699646643
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07415254237288135
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07410021171489062
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07404795486600846
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07399577167019028
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07394366197183098
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07389162561576355
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07383966244725738
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07378777231201687
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07373595505617977
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07368421052631578
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0126  |
| Iteration     | 55       |
| MaximumReturn | -0.00651 |
| MinimumReturn | -0.0189  |
| TotalSamples  | 94962    |
----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02203286997973919
Validation loss = 0.021959977224469185
Validation loss = 0.020707422867417336
Validation loss = 0.02224300056695938
Validation loss = 0.02153771184384823
Validation loss = 0.021741507574915886
Validation loss = 0.023290954530239105
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02256082557141781
Validation loss = 0.022373542189598083
Validation loss = 0.02364909090101719
Validation loss = 0.022426003590226173
Validation loss = 0.021756459027528763
Validation loss = 0.02334265597164631
Validation loss = 0.020762532949447632
Validation loss = 0.021619269624352455
Validation loss = 0.02195163443684578
Validation loss = 0.02178071439266205
Validation loss = 0.024050436913967133
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02197442390024662
Validation loss = 0.021650684997439384
Validation loss = 0.023250458762049675
Validation loss = 0.021607572212815285
Validation loss = 0.020840207114815712
Validation loss = 0.02112090401351452
Validation loss = 0.021791422739624977
Validation loss = 0.021213654428720474
Validation loss = 0.024549104273319244
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0222243070602417
Validation loss = 0.02260865643620491
Validation loss = 0.02198314480483532
Validation loss = 0.022730063647031784
Validation loss = 0.022663932293653488
Validation loss = 0.025799624621868134
Validation loss = 0.022281143814325333
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021803829818964005
Validation loss = 0.02089507132768631
Validation loss = 0.02552664838731289
Validation loss = 0.022161731496453285
Validation loss = 0.02106512151658535
Validation loss = 0.02221335470676422
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07363253856942496
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07358093903293624
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07352941176470588
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0734779566130161
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07342657342657342
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07337526205450734
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07332402234636871
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0732728541521284
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07322175732217573
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07317073170731707
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07311977715877438
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07306889352818371
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07301808066759388
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0729673384294649
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07291666666666667
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07286606523247745
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07281553398058252
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07276507276507277
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07271468144044321
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0726643598615917
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07261410788381743
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07256392536281962
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07251381215469613
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07246376811594203
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07241379310344828
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00104  |
| Iteration     | 56        |
| MaximumReturn | -0.000805 |
| MinimumReturn | -0.00139  |
| TotalSamples  | 96628     |
-----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02206183411180973
Validation loss = 0.022365614771842957
Validation loss = 0.022600265219807625
Validation loss = 0.02183873951435089
Validation loss = 0.022637011483311653
Validation loss = 0.02254743129014969
Validation loss = 0.024663008749485016
Validation loss = 0.021038422361016273
Validation loss = 0.021794063970446587
Validation loss = 0.0208540428429842
Validation loss = 0.022411873564124107
Validation loss = 0.024515124037861824
Validation loss = 0.02141786366701126
Validation loss = 0.021544048562645912
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021123170852661133
Validation loss = 0.020956993103027344
Validation loss = 0.020475050434470177
Validation loss = 0.021113954484462738
Validation loss = 0.02156258560717106
Validation loss = 0.022837035357952118
Validation loss = 0.02176823280751705
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022557437419891357
Validation loss = 0.020365575328469276
Validation loss = 0.021577365696430206
Validation loss = 0.02222500927746296
Validation loss = 0.021857023239135742
Validation loss = 0.02028971165418625
Validation loss = 0.022368907928466797
Validation loss = 0.021506719291210175
Validation loss = 0.02240917645394802
Validation loss = 0.024251995608210564
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021231062710285187
Validation loss = 0.022570287808775902
Validation loss = 0.022960400208830833
Validation loss = 0.02325320988893509
Validation loss = 0.022519102320075035
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021244382485747337
Validation loss = 0.02064063958823681
Validation loss = 0.02174356020987034
Validation loss = 0.02308633178472519
Validation loss = 0.025782311335206032
Validation loss = 0.019550854340195656
Validation loss = 0.020787404850125313
Validation loss = 0.021139031276106834
Validation loss = 0.023559710010886192
Validation loss = 0.02053198777139187
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07236388697450034
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07231404958677685
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07226428079834825
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07221458046767538
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07216494845360824
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07211538461538461
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0720658888126287
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0720164609053498
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07196710075394105
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07191780821917808
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07186858316221766
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07181942544459645
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07177033492822966
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07172131147540983
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07167235494880546
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07162346521145975
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07157464212678936
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07152588555858311
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07147719537100068
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07142857142857142
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07138001359619306
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07133152173913043
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07128309572301425
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07123473541383989
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0711864406779661
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000995 |
| Iteration     | 57        |
| MaximumReturn | -0.000701 |
| MinimumReturn | -0.00135  |
| TotalSamples  | 98294     |
-----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023275239393115044
Validation loss = 0.023121410980820656
Validation loss = 0.020737387239933014
Validation loss = 0.021594906225800514
Validation loss = 0.022126778960227966
Validation loss = 0.020991738885641098
Validation loss = 0.021340571343898773
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022150779142975807
Validation loss = 0.021102406084537506
Validation loss = 0.021695446223020554
Validation loss = 0.02347377873957157
Validation loss = 0.02151607908308506
Validation loss = 0.021389873698353767
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019769707694649696
Validation loss = 0.021957803517580032
Validation loss = 0.02130662463605404
Validation loss = 0.021682269871234894
Validation loss = 0.02162344753742218
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020887134596705437
Validation loss = 0.022519804537296295
Validation loss = 0.025283562019467354
Validation loss = 0.023801788687705994
Validation loss = 0.022986656054854393
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02189587615430355
Validation loss = 0.021088138222694397
Validation loss = 0.021126113831996918
Validation loss = 0.021378736943006516
Validation loss = 0.02270359732210636
Validation loss = 0.02191338501870632
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07113821138211382
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07109004739336493
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07104194857916103
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07099391480730223
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07094594594594594
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07089804186360567
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0708502024291498
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07080242751180041
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07075471698113207
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0707070707070707
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07065948855989233
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07061197041022192
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07056451612903226
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07051712558764271
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07046979865771812
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07042253521126761
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07037533512064344
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07032819825853985
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07028112449799197
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07023411371237458
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07018716577540107
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07014028056112225
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07009345794392523
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07004669779853236
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00167 |
| Iteration     | 58       |
| MaximumReturn | -0.00122 |
| MinimumReturn | -0.00277 |
| TotalSamples  | 99960    |
----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021741945296525955
Validation loss = 0.02420646697282791
Validation loss = 0.023501262068748474
Validation loss = 0.021926183253526688
Validation loss = 0.02232641540467739
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024266526103019714
Validation loss = 0.021659687161445618
Validation loss = 0.02185281366109848
Validation loss = 0.021244041621685028
Validation loss = 0.02139352634549141
Validation loss = 0.021733973175287247
Validation loss = 0.02029307372868061
Validation loss = 0.02181743085384369
Validation loss = 0.02134718745946884
Validation loss = 0.021217389032244682
Validation loss = 0.020602859556674957
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021773457527160645
Validation loss = 0.019940853118896484
Validation loss = 0.0235085841268301
Validation loss = 0.02172379568219185
Validation loss = 0.02225448191165924
Validation loss = 0.021897679194808006
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021757133305072784
Validation loss = 0.02274433523416519
Validation loss = 0.021480660885572433
Validation loss = 0.022853456437587738
Validation loss = 0.020847760140895844
Validation loss = 0.021149633452296257
Validation loss = 0.020986776798963547
Validation loss = 0.021809933707118034
Validation loss = 0.02272091805934906
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021422885358333588
Validation loss = 0.02165491133928299
Validation loss = 0.022539639845490456
Validation loss = 0.020058441907167435
Validation loss = 0.022021878510713577
Validation loss = 0.021442072466015816
Validation loss = 0.022672517225146294
Validation loss = 0.02371007390320301
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06995336442371752
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06990679094540612
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06986027944111776
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06981382978723404
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06976744186046512
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0697211155378486
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0696748506967485
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06962864721485411
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06958250497017893
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0695364238410596
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06949040370615486
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06944444444444445
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06939854593522803
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06935270805812417
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06930693069306931
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06926121372031663
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06921555702043507
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0691699604743083
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06912442396313365
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06907894736842106
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06903353057199212
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06898817345597898
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06894287590282337
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0688976377952756
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06885245901639345
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000999 |
| Iteration     | 59        |
| MaximumReturn | -0.000577 |
| MinimumReturn | -0.00147  |
| TotalSamples  | 101626    |
-----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02188398316502571
Validation loss = 0.022854309529066086
Validation loss = 0.021639440208673477
Validation loss = 0.02228194661438465
Validation loss = 0.021727776154875755
Validation loss = 0.021978257223963737
Validation loss = 0.022409183904528618
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02067161351442337
Validation loss = 0.022084766998887062
Validation loss = 0.02144920639693737
Validation loss = 0.02184728905558586
Validation loss = 0.022164683789014816
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022833414375782013
Validation loss = 0.021014990285038948
Validation loss = 0.022830069065093994
Validation loss = 0.02129928022623062
Validation loss = 0.020513838157057762
Validation loss = 0.019949832931160927
Validation loss = 0.02123081497848034
Validation loss = 0.02102162130177021
Validation loss = 0.02189946174621582
Validation loss = 0.021080754697322845
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021876197308301926
Validation loss = 0.02086588181555271
Validation loss = 0.020972304046154022
Validation loss = 0.02269221469759941
Validation loss = 0.022724278271198273
Validation loss = 0.0222480408847332
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021149776875972748
Validation loss = 0.020647449418902397
Validation loss = 0.02190563641488552
Validation loss = 0.020921697840094566
Validation loss = 0.022018546238541603
Validation loss = 0.021788818761706352
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06880733944954129
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.068762278978389
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06871727748691099
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06867233485938522
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06862745098039216
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06858262573481384
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0685378590078329
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0684931506849315
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06844850065189048
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06840390879478828
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.068359375
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06831489915419649
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06827048114434331
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0682261208576998
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06818181818181818
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0681375730045425
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06809338521400778
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06804925469863901
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06800518134715026
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06796116504854369
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06791720569210867
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06787330316742081
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06782945736434108
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06778566817301485
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06774193548387097
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00324 |
| Iteration     | 60       |
| MaximumReturn | -0.00242 |
| MinimumReturn | -0.00537 |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022141383960843086
Validation loss = 0.021777868270874023
Validation loss = 0.022684864699840546
Validation loss = 0.022071627900004387
Validation loss = 0.021760398522019386
Validation loss = 0.022252844646573067
Validation loss = 0.02288537099957466
Validation loss = 0.021820882335305214
Validation loss = 0.022472137585282326
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02246634103357792
Validation loss = 0.021350478753447533
Validation loss = 0.022071853280067444
Validation loss = 0.021781863644719124
Validation loss = 0.019616156816482544
Validation loss = 0.022607656195759773
Validation loss = 0.021443283185362816
Validation loss = 0.020945735275745392
Validation loss = 0.022601138800382614
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02320113405585289
Validation loss = 0.020135149359703064
Validation loss = 0.020368317142128944
Validation loss = 0.020618556067347527
Validation loss = 0.020398451015353203
Validation loss = 0.020701050758361816
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02203410118818283
Validation loss = 0.022315867245197296
Validation loss = 0.021721400320529938
Validation loss = 0.022695109248161316
Validation loss = 0.022892557084560394
Validation loss = 0.024053923785686493
Validation loss = 0.02153264731168747
Validation loss = 0.022041339427232742
Validation loss = 0.02264203503727913
Validation loss = 0.02273661457002163
Validation loss = 0.020405111834406853
Validation loss = 0.02187882363796234
Validation loss = 0.021120436489582062
Validation loss = 0.022036060690879822
Validation loss = 0.02399587072432041
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020800095051527023
Validation loss = 0.02208331786096096
Validation loss = 0.02298574149608612
Validation loss = 0.02092002145946026
Validation loss = 0.02200886607170105
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06769825918762089
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06765463917525773
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06761107533805538
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06756756756756757
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06752411575562701
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06748071979434447
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0674373795761079
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06739409499358151
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06735086593970493
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0673076923076923
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06726457399103139
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06722151088348272
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0671785028790787
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06713554987212277
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0670926517571885
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06704980842911877
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06700701978302488
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06696428571428571
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06692160611854685
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06687898089171974
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0668364099299809
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06679389312977099
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06675143038779402
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06670902160101652
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06666666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00266 |
| Iteration     | 61       |
| MaximumReturn | -0.00158 |
| MinimumReturn | -0.00376 |
| TotalSamples  | 104958   |
----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024143166840076447
Validation loss = 0.021633978933095932
Validation loss = 0.02122608944773674
Validation loss = 0.021833935752511024
Validation loss = 0.021969400346279144
Validation loss = 0.02144469879567623
Validation loss = 0.02167927473783493
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02274308353662491
Validation loss = 0.0241971705108881
Validation loss = 0.020756570622324944
Validation loss = 0.022410336881875992
Validation loss = 0.0215358454734087
Validation loss = 0.02171885035932064
Validation loss = 0.021978527307510376
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020601270720362663
Validation loss = 0.021695617586374283
Validation loss = 0.020362215116620064
Validation loss = 0.02356182597577572
Validation loss = 0.02173982374370098
Validation loss = 0.022401180118322372
Validation loss = 0.02067507617175579
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021599093452095985
Validation loss = 0.022938845679163933
Validation loss = 0.02133943885564804
Validation loss = 0.020517028868198395
Validation loss = 0.02267075516283512
Validation loss = 0.02135520428419113
Validation loss = 0.02206352725625038
Validation loss = 0.02124009281396866
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021083703264594078
Validation loss = 0.02273603156208992
Validation loss = 0.022161709144711494
Validation loss = 0.021177418529987335
Validation loss = 0.020923417061567307
Validation loss = 0.02007056586444378
Validation loss = 0.022586070001125336
Validation loss = 0.02259865030646324
Validation loss = 0.020882753655314445
Validation loss = 0.019749470055103302
Validation loss = 0.02139618806540966
Validation loss = 0.022304220125079155
Validation loss = 0.021267976611852646
Validation loss = 0.0224260576069355
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06662436548223351
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06658211794546608
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06653992395437262
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06649778340721976
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06645569620253164
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06641366223908918
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06637168141592921
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06632975363234365
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06628787878787878
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06624605678233439
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06620428751576292
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0661625708884688
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06612090680100756
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06607929515418502
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0660377358490566
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06599622878692646
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06595477386934673
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06591337099811675
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06587202007528231
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06583072100313479
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06578947368421052
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06574827802128992
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06570713391739674
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06566604127579738
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.065625
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00122  |
| Iteration     | 62        |
| MaximumReturn | -0.000824 |
| MinimumReturn | -0.00171  |
| TotalSamples  | 106624    |
-----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02116764895617962
Validation loss = 0.019506480544805527
Validation loss = 0.020931588485836983
Validation loss = 0.02103114314377308
Validation loss = 0.02054171822965145
Validation loss = 0.02230939082801342
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021313318982720375
Validation loss = 0.02284056507050991
Validation loss = 0.021353479474782944
Validation loss = 0.022350985556840897
Validation loss = 0.02112053707242012
Validation loss = 0.021705856546759605
Validation loss = 0.022135164588689804
Validation loss = 0.0222006905823946
Validation loss = 0.022962192073464394
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02343013882637024
Validation loss = 0.021525830030441284
Validation loss = 0.0208235215395689
Validation loss = 0.021086694672703743
Validation loss = 0.02004515938460827
Validation loss = 0.020396286621689796
Validation loss = 0.02125985361635685
Validation loss = 0.020975882187485695
Validation loss = 0.019321097061038017
Validation loss = 0.021209871396422386
Validation loss = 0.020807961001992226
Validation loss = 0.020674970000982285
Validation loss = 0.020928775891661644
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022599006071686745
Validation loss = 0.022041508927941322
Validation loss = 0.021629776805639267
Validation loss = 0.021592512726783752
Validation loss = 0.025638476014137268
Validation loss = 0.02138788439333439
Validation loss = 0.02162911742925644
Validation loss = 0.021014712750911713
Validation loss = 0.021848566830158234
Validation loss = 0.02039734646677971
Validation loss = 0.02163040265440941
Validation loss = 0.02246844582259655
Validation loss = 0.02140146680176258
Validation loss = 0.021927379071712494
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02171408198773861
Validation loss = 0.023309100419282913
Validation loss = 0.02213466539978981
Validation loss = 0.024132272228598595
Validation loss = 0.02029019221663475
Validation loss = 0.02124137617647648
Validation loss = 0.022739434614777565
Validation loss = 0.022289622575044632
Validation loss = 0.021085532382130623
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0655840099937539
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06554307116104868
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06550218340611354
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06546134663341646
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06542056074766354
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06537982565379825
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06533914125700062
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06529850746268656
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06525792417650715
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06521739130434782
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06517690875232775
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06513647642679901
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06509609423434594
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06505576208178439
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06501547987616099
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06497524752475248
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06493506493506493
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06489493201483312
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06485484867201977
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06481481481481481
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0647748303516348
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06473489519112208
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06469500924214418
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06465517241379311
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06461538461538462
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00283 |
| Iteration     | 63       |
| MaximumReturn | -0.00184 |
| MinimumReturn | -0.00453 |
| TotalSamples  | 108290   |
----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021836863830685616
Validation loss = 0.022427713498473167
Validation loss = 0.02228539250791073
Validation loss = 0.020908432081341743
Validation loss = 0.022841202095150948
Validation loss = 0.02190069854259491
Validation loss = 0.0218951478600502
Validation loss = 0.02201777510344982
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022802265360951424
Validation loss = 0.021778253838419914
Validation loss = 0.020709963515400887
Validation loss = 0.02001120150089264
Validation loss = 0.02504350244998932
Validation loss = 0.02190849371254444
Validation loss = 0.02169528789818287
Validation loss = 0.02094273455440998
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02214772440493107
Validation loss = 0.022939516231417656
Validation loss = 0.020495912060141563
Validation loss = 0.01997699774801731
Validation loss = 0.020461218431591988
Validation loss = 0.02064908854663372
Validation loss = 0.019658366218209267
Validation loss = 0.019665606319904327
Validation loss = 0.020803585648536682
Validation loss = 0.02213682234287262
Validation loss = 0.020360752940177917
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021412502974271774
Validation loss = 0.02141077257692814
Validation loss = 0.021239301189780235
Validation loss = 0.021647009998559952
Validation loss = 0.02324668876826763
Validation loss = 0.021472010761499405
Validation loss = 0.020527182146906853
Validation loss = 0.020868686959147453
Validation loss = 0.021767275407910347
Validation loss = 0.020771507173776627
Validation loss = 0.022150250151753426
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02130194567143917
Validation loss = 0.022265419363975525
Validation loss = 0.02131173387169838
Validation loss = 0.021523492410779
Validation loss = 0.02165013737976551
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06457564575645756
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0645359557467732
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0644963144963145
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06445672191528545
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06441717791411043
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06437768240343347
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06433823529411764
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06429883649724434
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06425948592411261
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06422018348623854
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06418092909535453
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06414172266340867
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0641025641025641
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06406345332519829
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06402439024390244
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06398537477148081
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06394640682095006
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06390748630553865
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06386861313868614
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06382978723404255
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0637910085054678
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06375227686703097
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06371359223300971
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06367495451788963
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06363636363636363
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00106  |
| Iteration     | 64        |
| MaximumReturn | -0.000689 |
| MinimumReturn | -0.0014   |
| TotalSamples  | 109956    |
-----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023726414889097214
Validation loss = 0.022649340331554413
Validation loss = 0.02017669938504696
Validation loss = 0.022376684471964836
Validation loss = 0.021823305636644363
Validation loss = 0.022509248927235603
Validation loss = 0.021618759259581566
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021633125841617584
Validation loss = 0.020809542387723923
Validation loss = 0.02144274301826954
Validation loss = 0.02076215296983719
Validation loss = 0.020692767575383186
Validation loss = 0.02444487251341343
Validation loss = 0.02051902376115322
Validation loss = 0.020902622491121292
Validation loss = 0.02195114642381668
Validation loss = 0.022372353821992874
Validation loss = 0.02132607437670231
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02114184759557247
Validation loss = 0.019594412297010422
Validation loss = 0.021197093650698662
Validation loss = 0.020363684743642807
Validation loss = 0.020446939393877983
Validation loss = 0.02032378315925598
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022627780213952065
Validation loss = 0.02213255688548088
Validation loss = 0.022146347910165787
Validation loss = 0.021135391667485237
Validation loss = 0.02046823687851429
Validation loss = 0.022276032716035843
Validation loss = 0.021265458315610886
Validation loss = 0.020946301519870758
Validation loss = 0.024151956662535667
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021751487627625465
Validation loss = 0.022474922239780426
Validation loss = 0.021143969148397446
Validation loss = 0.020496491342782974
Validation loss = 0.02051834762096405
Validation loss = 0.021533934399485588
Validation loss = 0.02088085375726223
Validation loss = 0.02127894014120102
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06359781950333132
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0635593220338983
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06352087114337568
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06348246674727932
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0634441087613293
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06340579710144928
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06336753168376584
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06332931242460796
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06329113924050633
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06325301204819277
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06321493076459964
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0631768953068592
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06313890559230306
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06310096153846154
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06306306306306306
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06302521008403361
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0629874025194961
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06294964028776978
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06291192330736968
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06287425149700598
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06283662477558348
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06279904306220095
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06276150627615062
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06272401433691756
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0626865671641791
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000955 |
| Iteration     | 65        |
| MaximumReturn | -0.000732 |
| MinimumReturn | -0.00135  |
| TotalSamples  | 111622    |
-----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021876279264688492
Validation loss = 0.02207295410335064
Validation loss = 0.02266307920217514
Validation loss = 0.020565161481499672
Validation loss = 0.02202153019607067
Validation loss = 0.021664578467607498
Validation loss = 0.02077457122504711
Validation loss = 0.020617932081222534
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021994320675730705
Validation loss = 0.02077645994722843
Validation loss = 0.02152046002447605
Validation loss = 0.021000463515520096
Validation loss = 0.021513676270842552
Validation loss = 0.02120058983564377
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021865112707018852
Validation loss = 0.020861078053712845
Validation loss = 0.022649696096777916
Validation loss = 0.020433388650417328
Validation loss = 0.02002200484275818
Validation loss = 0.022058622911572456
Validation loss = 0.022259170189499855
Validation loss = 0.0206235833466053
Validation loss = 0.02060381881892681
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023486152291297913
Validation loss = 0.022571420297026634
Validation loss = 0.021545639261603355
Validation loss = 0.022785475477576256
Validation loss = 0.022682247683405876
Validation loss = 0.02261432446539402
Validation loss = 0.024264054372906685
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023009639233350754
Validation loss = 0.021603615954518318
Validation loss = 0.022275203838944435
Validation loss = 0.021858714520931244
Validation loss = 0.024069717153906822
Validation loss = 0.022186217829585075
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06264916467780429
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0626118067978533
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06257449344457688
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06253722453841572
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0625
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.062462819750148724
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0624256837098692
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.062388591800356503
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.062351543942992874
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06231454005934718
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06227758007117438
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06224066390041494
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06220379146919431
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06216696269982238
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0621301775147929
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06209343583678297
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06205673758865248
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06202008269344359
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06198347107438017
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061946902654867256
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061910377358490566
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06187389510901591
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061837455830388695
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06180105944673337
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061764705882352944
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00129 |
| Iteration     | 66       |
| MaximumReturn | -0.00107 |
| MinimumReturn | -0.0017  |
| TotalSamples  | 113288   |
----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02154485322535038
Validation loss = 0.022304832935333252
Validation loss = 0.02346303127706051
Validation loss = 0.020944837480783463
Validation loss = 0.021365972235798836
Validation loss = 0.02202102169394493
Validation loss = 0.02185346744954586
Validation loss = 0.020186424255371094
Validation loss = 0.02252689190208912
Validation loss = 0.021374592557549477
Validation loss = 0.02025417983531952
Validation loss = 0.021244866773486137
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02093401364982128
Validation loss = 0.021042566746473312
Validation loss = 0.022184429690241814
Validation loss = 0.022287026047706604
Validation loss = 0.02201874554157257
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022149430587887764
Validation loss = 0.022153930738568306
Validation loss = 0.02132830023765564
Validation loss = 0.020326098427176476
Validation loss = 0.02108187787234783
Validation loss = 0.021012987941503525
Validation loss = 0.02174888364970684
Validation loss = 0.0215760450810194
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022732602432370186
Validation loss = 0.021213555708527565
Validation loss = 0.020902331918478012
Validation loss = 0.022182192653417587
Validation loss = 0.020506571978330612
Validation loss = 0.020958801731467247
Validation loss = 0.02230888418853283
Validation loss = 0.023828517645597458
Validation loss = 0.022278310731053352
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022244572639465332
Validation loss = 0.023152116686105728
Validation loss = 0.022041575983166695
Validation loss = 0.02509664185345173
Validation loss = 0.021064911037683487
Validation loss = 0.02125813066959381
Validation loss = 0.021673087030649185
Validation loss = 0.02301868423819542
Validation loss = 0.0228427741676569
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06172839506172839
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06169212690951821
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06165590135055784
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061619718309859156
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06158357771260997
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0615474794841735
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061511423550087874
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06147540983606557
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06143943826799298
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06140350877192982
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061367621274108705
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06133177570093458
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06129597197898424
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061260210035005834
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061224489795918366
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06118881118881119
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061153174140943505
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06111757857974389
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06108202443280977
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061046511627906974
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061011040092969204
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06097560975609756
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06094022054556007
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.060904872389791184
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06086956521739131
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00274 |
| Iteration     | 67       |
| MaximumReturn | -0.00198 |
| MinimumReturn | -0.00385 |
| TotalSamples  | 114954   |
----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021666813641786575
Validation loss = 0.020820805802941322
Validation loss = 0.020664799958467484
Validation loss = 0.02157660201191902
Validation loss = 0.02078922465443611
Validation loss = 0.02206571213901043
Validation loss = 0.024085747078061104
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02354300767183304
Validation loss = 0.022057194262742996
Validation loss = 0.02176493965089321
Validation loss = 0.020746523514389992
Validation loss = 0.021028218790888786
Validation loss = 0.02141622081398964
Validation loss = 0.021059075370430946
Validation loss = 0.023114921525120735
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022006168961524963
Validation loss = 0.02192661352455616
Validation loss = 0.02005050703883171
Validation loss = 0.021193746477365494
Validation loss = 0.021221308037638664
Validation loss = 0.02161361835896969
Validation loss = 0.020307110622525215
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021894095465540886
Validation loss = 0.02207311801612377
Validation loss = 0.02312490902841091
Validation loss = 0.02176729030907154
Validation loss = 0.021439043805003166
Validation loss = 0.02111230418086052
Validation loss = 0.02132018655538559
Validation loss = 0.02101684734225273
Validation loss = 0.02206726372241974
Validation loss = 0.022262124344706535
Validation loss = 0.020530937239527702
Validation loss = 0.022842878475785255
Validation loss = 0.022706324234604836
Validation loss = 0.02190556190907955
Validation loss = 0.022229544818401337
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020682470872998238
Validation loss = 0.021449534222483635
Validation loss = 0.02213735692203045
Validation loss = 0.023042894899845123
Validation loss = 0.02158251963555813
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.060834298957126304
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06079907353792704
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06076388888888889
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06072874493927125
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06069364161849711
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.060658578856152515
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06062355658198614
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.060588574725908825
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06055363321799308
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06051873198847262
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06048387096774194
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06044905008635579
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06041426927502877
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.060379528464634846
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0603448275862069
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06031016657093624
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06027554535017222
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.060240963855421686
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.060206422018348624
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06017191977077364
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06013745704467354
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06010303377218088
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.060068649885583525
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.060034305317324184
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00101  |
| Iteration     | 68        |
| MaximumReturn | -0.000796 |
| MinimumReturn | -0.00136  |
| TotalSamples  | 116620    |
-----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024095602333545685
Validation loss = 0.01941523514688015
Validation loss = 0.020848853513598442
Validation loss = 0.02308126911520958
Validation loss = 0.021257102489471436
Validation loss = 0.02255658432841301
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02302764169871807
Validation loss = 0.021551955491304398
Validation loss = 0.020970238372683525
Validation loss = 0.022403063252568245
Validation loss = 0.021379008889198303
Validation loss = 0.022490384057164192
Validation loss = 0.02109760232269764
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022484203800559044
Validation loss = 0.020541206002235413
Validation loss = 0.02460441179573536
Validation loss = 0.020988740026950836
Validation loss = 0.021714037284255028
Validation loss = 0.022111959755420685
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021946195513010025
Validation loss = 0.022055678069591522
Validation loss = 0.02161712571978569
Validation loss = 0.021641874685883522
Validation loss = 0.020836155861616135
Validation loss = 0.02131754532456398
Validation loss = 0.020787931978702545
Validation loss = 0.021035639569163322
Validation loss = 0.021024413406848907
Validation loss = 0.021590881049633026
Validation loss = 0.021482963114976883
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021510140970349312
Validation loss = 0.021684076637029648
Validation loss = 0.022437479346990585
Validation loss = 0.02180401235818863
Validation loss = 0.021650029346346855
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05996573386636208
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.059931506849315065
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05989731888191671
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05986316989737742
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05982905982905983
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05979498861047836
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05976095617529881
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.059726962457337884
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05969300739056282
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05965909090909091
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.059625212947189095
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05959137343927355
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05955757231990925
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05952380952380952
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.059490084985835696
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.059456398640996604
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.059422750424448216
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05938914027149321
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.059355568117580554
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.059322033898305086
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05928853754940711
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05925507900677201
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05922165820642978
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05918827508455468
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.059154929577464786
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00104  |
| Iteration     | 69        |
| MaximumReturn | -0.000747 |
| MinimumReturn | -0.00139  |
| TotalSamples  | 118286    |
-----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021155700087547302
Validation loss = 0.020908813923597336
Validation loss = 0.02198931947350502
Validation loss = 0.02096536196768284
Validation loss = 0.021158864721655846
Validation loss = 0.023764731362462044
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020545028150081635
Validation loss = 0.020546985790133476
Validation loss = 0.0215203445404768
Validation loss = 0.021520813927054405
Validation loss = 0.021223098039627075
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022917548194527626
Validation loss = 0.019918149337172508
Validation loss = 0.020812509581446648
Validation loss = 0.02080608159303665
Validation loss = 0.02096627838909626
Validation loss = 0.018892765045166016
Validation loss = 0.020511895418167114
Validation loss = 0.01967473328113556
Validation loss = 0.019831253215670586
Validation loss = 0.020044783130288124
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020540568977594376
Validation loss = 0.02178826369345188
Validation loss = 0.024380788207054138
Validation loss = 0.020878735929727554
Validation loss = 0.02171429805457592
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02078668773174286
Validation loss = 0.02142851985991001
Validation loss = 0.020401638001203537
Validation loss = 0.02124096266925335
Validation loss = 0.022407229989767075
Validation loss = 0.02148437686264515
Validation loss = 0.021288633346557617
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05912162162162162
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05908835115362971
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05905511811023622
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05902192242833052
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05898876404494382
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.058955642897248736
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.058922558922558925
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05888951205832866
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05885650224215247
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.058823529411764705
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05879059350503919
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05875769445998881
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0587248322147651
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05869200670765791
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05865921787709497
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05862646566164154
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05859375
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05856107083100948
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05852842809364549
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0584958217270195
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05846325167037862
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05843071786310518
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05839822024471635
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.058365758754863814
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.058333333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00102  |
| Iteration     | 70        |
| MaximumReturn | -0.000725 |
| MinimumReturn | -0.00134  |
| TotalSamples  | 119952    |
-----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02321421541273594
Validation loss = 0.02218141406774521
Validation loss = 0.020667679607868195
Validation loss = 0.020600462332367897
Validation loss = 0.020866943523287773
Validation loss = 0.021416369825601578
Validation loss = 0.02119690738618374
Validation loss = 0.0209275484085083
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02031141147017479
Validation loss = 0.021048463881015778
Validation loss = 0.01940017379820347
Validation loss = 0.02122524566948414
Validation loss = 0.02256459929049015
Validation loss = 0.02140827104449272
Validation loss = 0.02222006767988205
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020874951034784317
Validation loss = 0.021405162289738655
Validation loss = 0.018665647134184837
Validation loss = 0.021191855892539024
Validation loss = 0.019372357055544853
Validation loss = 0.020924178883433342
Validation loss = 0.020662125200033188
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020687373355031013
Validation loss = 0.023034052923321724
Validation loss = 0.020878924056887627
Validation loss = 0.019315307959914207
Validation loss = 0.021429751068353653
Validation loss = 0.02164149098098278
Validation loss = 0.023333558812737465
Validation loss = 0.020680202171206474
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022560613229870796
Validation loss = 0.022239945828914642
Validation loss = 0.02191937156021595
Validation loss = 0.023000670596957207
Validation loss = 0.021228482946753502
Validation loss = 0.021111031994223595
Validation loss = 0.026747046038508415
Validation loss = 0.021020321175456047
Validation loss = 0.022224105894565582
Validation loss = 0.02174198068678379
Validation loss = 0.021495122462511063
Validation loss = 0.022271640598773956
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05830094392004442
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.058268590455049944
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05823627287853577
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0582039911308204
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05817174515235457
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05813953488372093
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.058107360265633644
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05807522123893805
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05804311774461028
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.058011049723756904
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05797901711761458
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057947019867549666
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05791505791505792
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057883131201764054
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05785123966942149
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05781938325991189
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05778756191524491
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057755775577557754
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057724024189114896
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057692307692307696
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057660626029654036
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057628979143798026
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0575973669775096
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05756578947368421
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057534246575342465
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.001    |
| Iteration     | 71        |
| MaximumReturn | -0.000696 |
| MinimumReturn | -0.00139  |
| TotalSamples  | 121618    |
-----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020874759182333946
Validation loss = 0.021936822682619095
Validation loss = 0.021089082583785057
Validation loss = 0.023151252418756485
Validation loss = 0.02117512933909893
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021627869457006454
Validation loss = 0.021489273756742477
Validation loss = 0.021356286481022835
Validation loss = 0.021925373002886772
Validation loss = 0.021806174889206886
Validation loss = 0.02103545144200325
Validation loss = 0.020566757768392563
Validation loss = 0.021949201822280884
Validation loss = 0.026204289868474007
Validation loss = 0.02048361301422119
Validation loss = 0.020025068894028664
Validation loss = 0.021245496347546577
Validation loss = 0.021537352353334427
Validation loss = 0.02047414891421795
Validation loss = 0.021833540871739388
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021917372941970825
Validation loss = 0.021913910284638405
Validation loss = 0.020341461524367332
Validation loss = 0.022672900930047035
Validation loss = 0.02123761922121048
Validation loss = 0.020822083577513695
Validation loss = 0.019987184554338455
Validation loss = 0.022701462730765343
Validation loss = 0.020768355578184128
Validation loss = 0.02059347555041313
Validation loss = 0.020797552540898323
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022588929161429405
Validation loss = 0.023167433217167854
Validation loss = 0.02172282338142395
Validation loss = 0.022745151072740555
Validation loss = 0.02170228771865368
Validation loss = 0.022363753989338875
Validation loss = 0.020935814827680588
Validation loss = 0.0215765368193388
Validation loss = 0.022141071036458015
Validation loss = 0.022940684109926224
Validation loss = 0.023051761090755463
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022638825699687004
Validation loss = 0.020613951608538628
Validation loss = 0.021605001762509346
Validation loss = 0.022116895765066147
Validation loss = 0.02284310944378376
Validation loss = 0.02387242205440998
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05750273822562979
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05747126436781609
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057439824945295405
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057408419901585565
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05737704918032787
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05734571272528673
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05731441048034935
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057283142389525366
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05725190839694656
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05722070844686648
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05718954248366013
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05715841045182363
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057127312295973884
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057096247960848286
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057065217391304345
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057034220532319393
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057003257328990226
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05697232772653283
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056941431670282
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056910569105691054
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056879739978331526
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0568489442338928
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056818181818181816
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05678745267712277
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05675675675675676
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00108  |
| Iteration     | 72        |
| MaximumReturn | -0.000751 |
| MinimumReturn | -0.00147  |
| TotalSamples  | 123284    |
-----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021519050002098083
Validation loss = 0.021391600370407104
Validation loss = 0.022308828309178352
Validation loss = 0.020990293473005295
Validation loss = 0.020819315686821938
Validation loss = 0.022772742435336113
Validation loss = 0.022225303575396538
Validation loss = 0.021023137494921684
Validation loss = 0.020862843841314316
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021335186436772346
Validation loss = 0.023307885974645615
Validation loss = 0.021275945007801056
Validation loss = 0.020706623792648315
Validation loss = 0.021070804446935654
Validation loss = 0.021704234182834625
Validation loss = 0.02204837091267109
Validation loss = 0.02168005146086216
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019902879372239113
Validation loss = 0.020126905292272568
Validation loss = 0.019493388012051582
Validation loss = 0.01944056712090969
Validation loss = 0.020921751856803894
Validation loss = 0.02055334486067295
Validation loss = 0.020314324647188187
Validation loss = 0.02031235583126545
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02422749437391758
Validation loss = 0.021609241142868996
Validation loss = 0.021445201709866524
Validation loss = 0.024761930108070374
Validation loss = 0.021744465455412865
Validation loss = 0.021689869463443756
Validation loss = 0.022156748920679092
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023111257702112198
Validation loss = 0.021887028589844704
Validation loss = 0.022924864664673805
Validation loss = 0.023571575060486794
Validation loss = 0.02222832478582859
Validation loss = 0.02196710743010044
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05672609400324149
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05669546436285097
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05666486778197517
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05663430420711974
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05660377358490566
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056573275862068964
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05654281098546042
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05651237890204521
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056481979558902634
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056451612903225805
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05642127888232133
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05639097744360902
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05636070853462158
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05633047210300429
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05630026809651475
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05627009646302251
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056239957150508835
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05620985010706638
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056179775280898875
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05614973262032086
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05611972207375735
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05608974358974359
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05605979711692472
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056029882604055496
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0011   |
| Iteration     | 73        |
| MaximumReturn | -0.000831 |
| MinimumReturn | -0.0014   |
| TotalSamples  | 124950    |
-----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02151847817003727
Validation loss = 0.021403929218649864
Validation loss = 0.021101512014865875
Validation loss = 0.020621348172426224
Validation loss = 0.02063676528632641
Validation loss = 0.02050890401005745
Validation loss = 0.024781346321105957
Validation loss = 0.021430728957057
Validation loss = 0.021572599187493324
Validation loss = 0.02110861986875534
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021759260445833206
Validation loss = 0.021282296627759933
Validation loss = 0.021489273756742477
Validation loss = 0.02091778628528118
Validation loss = 0.02296030893921852
Validation loss = 0.02112140879034996
Validation loss = 0.02157444879412651
Validation loss = 0.02253587543964386
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0216669999063015
Validation loss = 0.019540440291166306
Validation loss = 0.0202566497027874
Validation loss = 0.019389968365430832
Validation loss = 0.02138516679406166
Validation loss = 0.021335400640964508
Validation loss = 0.021351369097828865
Validation loss = 0.02224843204021454
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022253623232245445
Validation loss = 0.021676069125533104
Validation loss = 0.021929960697889328
Validation loss = 0.02195066027343273
Validation loss = 0.021503599360585213
Validation loss = 0.02179255522787571
Validation loss = 0.021388087421655655
Validation loss = 0.021177463233470917
Validation loss = 0.022806361317634583
Validation loss = 0.02078545093536377
Validation loss = 0.021404894068837166
Validation loss = 0.021935317665338516
Validation loss = 0.021925542503595352
Validation loss = 0.02089378982782364
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02194402553141117
Validation loss = 0.021306276321411133
Validation loss = 0.021600253880023956
Validation loss = 0.022670311853289604
Validation loss = 0.023484116420149803
Validation loss = 0.021684205159544945
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055970149253731345
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05594033031433138
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05591054313099041
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05588078765300692
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05585106382978723
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05582137161084529
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05579171094580234
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055762081784386616
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05573248407643312
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05570291777188329
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05567338282078473
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05564387917329094
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055614406779661014
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0555849655902594
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05555555555555555
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05552617662612375
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0554968287526427
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0554675118858954
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05543822597676874
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055408970976253295
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055379746835443035
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055350553505535055
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05532139093782929
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05529225908372828
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05526315789473684
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0019  |
| Iteration     | 74       |
| MaximumReturn | -0.00135 |
| MinimumReturn | -0.00261 |
| TotalSamples  | 126616   |
----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02003471925854683
Validation loss = 0.02124405838549137
Validation loss = 0.021974973380565643
Validation loss = 0.02281317673623562
Validation loss = 0.022884512320160866
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021006491035223007
Validation loss = 0.020461393520236015
Validation loss = 0.022746635600924492
Validation loss = 0.02058081515133381
Validation loss = 0.021093927323818207
Validation loss = 0.022349920123815536
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021160785108804703
Validation loss = 0.021127983927726746
Validation loss = 0.02041911706328392
Validation loss = 0.021921709179878235
Validation loss = 0.019797764718532562
Validation loss = 0.02036133036017418
Validation loss = 0.020665490999817848
Validation loss = 0.020071672275662422
Validation loss = 0.01966123655438423
Validation loss = 0.020385053008794785
Validation loss = 0.021152537316083908
Validation loss = 0.019915590062737465
Validation loss = 0.02062261290848255
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023444440215826035
Validation loss = 0.021334750577807426
Validation loss = 0.02113071270287037
Validation loss = 0.021417737007141113
Validation loss = 0.021402310580015182
Validation loss = 0.02235729619860649
Validation loss = 0.0213197972625494
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02303815260529518
Validation loss = 0.02154834009706974
Validation loss = 0.022121626883745193
Validation loss = 0.022547053173184395
Validation loss = 0.021186064928770065
Validation loss = 0.02229170873761177
Validation loss = 0.0224977545440197
Validation loss = 0.021041270345449448
Validation loss = 0.022266093641519547
Validation loss = 0.02187325246632099
Validation loss = 0.02130574733018875
Validation loss = 0.021702008321881294
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055234087322461864
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055205047318611984
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055176037834997374
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05514705882352941
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05511811023622047
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05508919202518363
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05506030414263241
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055031446540880505
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05500261917234154
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0549738219895288
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.054945054945054944
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0549163179916318
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05488761108207005
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.054858934169279
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05483028720626632
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.054801670146137786
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.054773082942097026
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05474452554744526
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05471599791558103
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0546875
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05465903175429464
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05463059313215401
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.054602184087363496
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05457380457380458
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05454545454545454
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000933 |
| Iteration     | 75        |
| MaximumReturn | -0.000689 |
| MinimumReturn | -0.00119  |
| TotalSamples  | 128282    |
-----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02189350500702858
Validation loss = 0.02287018671631813
Validation loss = 0.02192673645913601
Validation loss = 0.02188963070511818
Validation loss = 0.02157013490796089
Validation loss = 0.021445248275995255
Validation loss = 0.02032516524195671
Validation loss = 0.022725909948349
Validation loss = 0.022866591811180115
Validation loss = 0.020885882899165154
Validation loss = 0.023235373198986053
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020433347672224045
Validation loss = 0.019757580012083054
Validation loss = 0.020271699875593185
Validation loss = 0.02026306837797165
Validation loss = 0.020995639264583588
Validation loss = 0.024000750854611397
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019068554043769836
Validation loss = 0.020182505249977112
Validation loss = 0.021286319941282272
Validation loss = 0.020914070308208466
Validation loss = 0.02140854299068451
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021989397704601288
Validation loss = 0.020750749856233597
Validation loss = 0.020575013011693954
Validation loss = 0.020518861711025238
Validation loss = 0.02100435458123684
Validation loss = 0.02179379016160965
Validation loss = 0.022174853831529617
Validation loss = 0.02095634862780571
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021211784332990646
Validation loss = 0.02244948223233223
Validation loss = 0.021985961124300957
Validation loss = 0.02298279106616974
Validation loss = 0.022300012409687042
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05451713395638629
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.054488842760768035
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05446058091286307
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05443234836702955
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.054404145077720206
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05437597099948213
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05434782608695652
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05431971029487843
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05429162357807653
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05426356589147287
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.054235537190082644
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05420753742901394
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05417956656346749
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05415162454873646
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05412371134020619
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05409582689335394
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.054067971163748715
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05404014410705095
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05401234567901234
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05398457583547558
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0539568345323741
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053929121725731895
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053901437371663245
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053873781426372495
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05384615384615385
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00107  |
| Iteration     | 76        |
| MaximumReturn | -0.000735 |
| MinimumReturn | -0.00149  |
| TotalSamples  | 129948    |
-----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02214145101606846
Validation loss = 0.02162969298660755
Validation loss = 0.020450392737984657
Validation loss = 0.02169193886220455
Validation loss = 0.020778946578502655
Validation loss = 0.02101045660674572
Validation loss = 0.021865077316761017
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02223031036555767
Validation loss = 0.021442048251628876
Validation loss = 0.02090011164546013
Validation loss = 0.021504735574126244
Validation loss = 0.02124764770269394
Validation loss = 0.020841972902417183
Validation loss = 0.02126440219581127
Validation loss = 0.021685535088181496
Validation loss = 0.02032647095620632
Validation loss = 0.02160937525331974
Validation loss = 0.022540152072906494
Validation loss = 0.022711321711540222
Validation loss = 0.02169385738670826
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019866667687892914
Validation loss = 0.019875522702932358
Validation loss = 0.0199165940284729
Validation loss = 0.019437972456216812
Validation loss = 0.02101624570786953
Validation loss = 0.020086122676730156
Validation loss = 0.020237334072589874
Validation loss = 0.023174917325377464
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02198931761085987
Validation loss = 0.02205634117126465
Validation loss = 0.021876465529203415
Validation loss = 0.02339218184351921
Validation loss = 0.0213584303855896
Validation loss = 0.021681297570466995
Validation loss = 0.02271108515560627
Validation loss = 0.023656025528907776
Validation loss = 0.021391263231635094
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021059256047010422
Validation loss = 0.021263690665364265
Validation loss = 0.020936354994773865
Validation loss = 0.021973159164190292
Validation loss = 0.02156609483063221
Validation loss = 0.022408995777368546
Validation loss = 0.020011331886053085
Validation loss = 0.022530898451805115
Validation loss = 0.022010426968336105
Validation loss = 0.02141791395843029
Validation loss = 0.021426713094115257
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053818554587391085
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053790983606557374
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053763440860215055
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05373592630501536
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05370843989769821
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05368098159509203
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05365355135411344
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05362614913176711
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05359877488514548
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05357142857142857
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05354411014788373
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053516819571865444
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05348955680081508
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053462321792260695
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05343511450381679
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05340793489318413
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05338078291814947
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053353658536585365
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053326561706449976
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0532994923857868
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0532724505327245
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053245436105476676
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05321844906234161
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05319148936170213
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053164556962025315
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00109  |
| Iteration     | 77        |
| MaximumReturn | -0.000825 |
| MinimumReturn | -0.00161  |
| TotalSamples  | 131614    |
-----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021105321124196053
Validation loss = 0.0200621597468853
Validation loss = 0.021925728768110275
Validation loss = 0.021730979904532433
Validation loss = 0.022024234756827354
Validation loss = 0.02177461050450802
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022022996097803116
Validation loss = 0.02297748066484928
Validation loss = 0.021125776693224907
Validation loss = 0.021230371668934822
Validation loss = 0.022989865392446518
Validation loss = 0.020791077986359596
Validation loss = 0.022107677534222603
Validation loss = 0.022728100419044495
Validation loss = 0.020294759422540665
Validation loss = 0.020188262686133385
Validation loss = 0.02142307534813881
Validation loss = 0.02210422232747078
Validation loss = 0.02103203535079956
Validation loss = 0.022690681740641594
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02004939317703247
Validation loss = 0.022140294313430786
Validation loss = 0.01973990723490715
Validation loss = 0.019711345434188843
Validation loss = 0.020525386556982994
Validation loss = 0.01988929510116577
Validation loss = 0.02051456831395626
Validation loss = 0.019903259351849556
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021089937537908554
Validation loss = 0.021424798294901848
Validation loss = 0.02122158743441105
Validation loss = 0.021322373300790787
Validation loss = 0.023057913407683372
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020734811201691628
Validation loss = 0.0212575513869524
Validation loss = 0.02130340225994587
Validation loss = 0.022074947133660316
Validation loss = 0.02419457398355007
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05313765182186235
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05311077389984825
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05308392315470172
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05305709954522486
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05303030303030303
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.053003533568904596
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052976791120080725
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0529500756429652
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052923387096774195
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05289672544080604
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052870090634441085
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05284348263714142
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0528169014084507
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05279034690799397
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052763819095477386
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052737317930688095
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05271084337349398
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05268439538384345
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05265797392176529
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05263157894736842
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05260521042084168
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05257886830245368
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052552552552552555
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05252626313156578
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0525
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000934 |
| Iteration     | 78        |
| MaximumReturn | -0.00066  |
| MinimumReturn | -0.0013   |
| TotalSamples  | 133280    |
-----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02156035602092743
Validation loss = 0.024643726646900177
Validation loss = 0.02336880937218666
Validation loss = 0.021509133279323578
Validation loss = 0.021292200312018394
Validation loss = 0.020921289920806885
Validation loss = 0.01983548514544964
Validation loss = 0.02231130562722683
Validation loss = 0.02091255411505699
Validation loss = 0.021348489448428154
Validation loss = 0.020825695246458054
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021590881049633026
Validation loss = 0.020848320797085762
Validation loss = 0.021734818816184998
Validation loss = 0.021172938868403435
Validation loss = 0.020128408446907997
Validation loss = 0.021435143426060677
Validation loss = 0.020924458280205727
Validation loss = 0.020196912810206413
Validation loss = 0.021484611555933952
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02341986633837223
Validation loss = 0.020349780097603798
Validation loss = 0.02044478803873062
Validation loss = 0.021588094532489777
Validation loss = 0.02166656404733658
Validation loss = 0.021931303665041924
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021313292905688286
Validation loss = 0.021571094170212746
Validation loss = 0.020752418786287308
Validation loss = 0.019971048459410667
Validation loss = 0.022496983408927917
Validation loss = 0.02209080010652542
Validation loss = 0.021805789321660995
Validation loss = 0.02121163159608841
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0228903666138649
Validation loss = 0.022256631404161453
Validation loss = 0.021653974428772926
Validation loss = 0.023750999942421913
Validation loss = 0.022229835391044617
Validation loss = 0.02286767028272152
Validation loss = 0.022370699793100357
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05247376311844078
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05244755244755245
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052421367948077884
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05239520958083832
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05236907730673317
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052342971086739784
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052316890881913304
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052290836653386456
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05226480836236934
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05223880597014925
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052212829438090504
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052186878727634195
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05216095380029806
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052135054617676264
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052109181141439205
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052083333333333336
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05205751115518096
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05203171456888008
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05200594353640416
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05198019801980198
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05195447798119743
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05192878338278932
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05190311418685121
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.051877470355731224
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05185185185185185
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.001    |
| Iteration     | 79        |
| MaximumReturn | -0.000692 |
| MinimumReturn | -0.00136  |
| TotalSamples  | 134946    |
-----------------------------
