Logging to experiments/gym_fswimmer/nov4/SA01_w350e1_seed2631
Print configuration .....
{'env_name': 'gym_fswimmer', 'random_seeds': [2312, 1231, 2631, 5543], 'save_variables': False, 'model_save_dir': '/tmp/gym_fswimmer_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'intrinsic_reward_only': False, 'external_reward_evaluation_interval': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [32, 32], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 200, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'trpo_ext_reward': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.38316798210144043
Validation loss = 0.17618951201438904
Validation loss = 0.11788532137870789
Validation loss = 0.10166035592556
Validation loss = 0.09390991926193237
Validation loss = 0.08877312391996384
Validation loss = 0.09090124070644379
Validation loss = 0.0845695286989212
Validation loss = 0.08490800857543945
Validation loss = 0.08462201058864594
Validation loss = 0.08303045481443405
Validation loss = 0.08598127216100693
Validation loss = 0.09775495529174805
Validation loss = 0.08117935061454773
Validation loss = 0.0822729766368866
Validation loss = 0.08409053832292557
Validation loss = 0.0811004787683487
Validation loss = 0.07833568751811981
Validation loss = 0.0796939879655838
Validation loss = 0.07922428846359253
Validation loss = 0.07743406295776367
Validation loss = 0.08375215530395508
Validation loss = 0.07994955778121948
Validation loss = 0.07694219052791595
Validation loss = 0.07462869584560394
Validation loss = 0.0784730315208435
Validation loss = 0.0792844146490097
Validation loss = 0.07721341401338577
Validation loss = 0.07588495314121246
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3852763772010803
Validation loss = 0.1685829758644104
Validation loss = 0.11187610775232315
Validation loss = 0.0973530262708664
Validation loss = 0.10125113278627396
Validation loss = 0.09440529346466064
Validation loss = 0.08888102322816849
Validation loss = 0.08462822437286377
Validation loss = 0.0889904573559761
Validation loss = 0.08376580476760864
Validation loss = 0.08550303429365158
Validation loss = 0.0869891494512558
Validation loss = 0.08184660971164703
Validation loss = 0.08137572556734085
Validation loss = 0.08315586298704147
Validation loss = 0.07859437167644501
Validation loss = 0.08272367715835571
Validation loss = 0.07954686135053635
Validation loss = 0.08115243166685104
Validation loss = 0.08153574168682098
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5138533115386963
Validation loss = 0.1763504296541214
Validation loss = 0.12209726870059967
Validation loss = 0.10268361866474152
Validation loss = 0.09243093430995941
Validation loss = 0.09467058628797531
Validation loss = 0.08893625438213348
Validation loss = 0.09032121300697327
Validation loss = 0.08420570194721222
Validation loss = 0.08569501340389252
Validation loss = 0.08126324415206909
Validation loss = 0.08429104834794998
Validation loss = 0.08524827659130096
Validation loss = 0.08251579105854034
Validation loss = 0.08204693347215652
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3806005120277405
Validation loss = 0.18855968117713928
Validation loss = 0.12668491899967194
Validation loss = 0.10116038471460342
Validation loss = 0.09466458857059479
Validation loss = 0.09149715304374695
Validation loss = 0.08801379799842834
Validation loss = 0.08486002683639526
Validation loss = 0.08504478633403778
Validation loss = 0.08214078098535538
Validation loss = 0.0839972123503685
Validation loss = 0.08161486685276031
Validation loss = 0.08470386266708374
Validation loss = 0.08171606063842773
Validation loss = 0.07873918116092682
Validation loss = 0.07902491092681885
Validation loss = 0.08280660957098007
Validation loss = 0.08008497953414917
Validation loss = 0.08090607076883316
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5656862854957581
Validation loss = 0.1938055157661438
Validation loss = 0.12569686770439148
Validation loss = 0.10611070692539215
Validation loss = 0.09464157372713089
Validation loss = 0.09279362112283707
Validation loss = 0.08710165321826935
Validation loss = 0.09249190986156464
Validation loss = 0.08701951801776886
Validation loss = 0.09264183044433594
Validation loss = 0.08079835772514343
Validation loss = 0.09628091752529144
Validation loss = 0.08613377809524536
Validation loss = 0.08156827092170715
Validation loss = 0.08363635838031769
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 13
average number of affinization = 1.625
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 8
average number of affinization = 2.3333333333333335
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 7
average number of affinization = 2.8
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 10
average number of affinization = 3.4545454545454546
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 13
average number of affinization = 4.25
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 154      |
| Iteration     | 0        |
| MaximumReturn | 171      |
| MinimumReturn | 143      |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09260225296020508
Validation loss = 0.04803786426782608
Validation loss = 0.045335620641708374
Validation loss = 0.04551463946700096
Validation loss = 0.04604849964380264
Validation loss = 0.04312227666378021
Validation loss = 0.04413636028766632
Validation loss = 0.0436355285346508
Validation loss = 0.04132841154932976
Validation loss = 0.04441344365477562
Validation loss = 0.04157750681042671
Validation loss = 0.04370727390050888
Validation loss = 0.042369142174720764
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08984330296516418
Validation loss = 0.050262369215488434
Validation loss = 0.047637633979320526
Validation loss = 0.04669022560119629
Validation loss = 0.04698104038834572
Validation loss = 0.04458283260464668
Validation loss = 0.043547410517930984
Validation loss = 0.04847792163491249
Validation loss = 0.04663398489356041
Validation loss = 0.043117083609104156
Validation loss = 0.04482538625597954
Validation loss = 0.045901697129011154
Validation loss = 0.05005902796983719
Validation loss = 0.041959360241889954
Validation loss = 0.04821695014834404
Validation loss = 0.0428759790956974
Validation loss = 0.04182616248726845
Validation loss = 0.04391089454293251
Validation loss = 0.04285791888833046
Validation loss = 0.045569706708192825
Validation loss = 0.04278922826051712
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08056861907243729
Validation loss = 0.050187431275844574
Validation loss = 0.04886419698596001
Validation loss = 0.04777265340089798
Validation loss = 0.04747714474797249
Validation loss = 0.045597612857818604
Validation loss = 0.04956359788775444
Validation loss = 0.04766123369336128
Validation loss = 0.04659533128142357
Validation loss = 0.04745416343212128
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09474578499794006
Validation loss = 0.05018855631351471
Validation loss = 0.046813540160655975
Validation loss = 0.046886757016181946
Validation loss = 0.044541217386722565
Validation loss = 0.04626084864139557
Validation loss = 0.04492783173918724
Validation loss = 0.04338515177369118
Validation loss = 0.04415402561426163
Validation loss = 0.04342159628868103
Validation loss = 0.04868662729859352
Validation loss = 0.045609455555677414
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0952984020113945
Validation loss = 0.04919980466365814
Validation loss = 0.047964632511138916
Validation loss = 0.04756321385502815
Validation loss = 0.04658639803528786
Validation loss = 0.05156070366501808
Validation loss = 0.051965899765491486
Validation loss = 0.04486803710460663
Validation loss = 0.04436890408396721
Validation loss = 0.044221922755241394
Validation loss = 0.04618174582719803
Validation loss = 0.04410728067159653
Validation loss = 0.04192746430635452
Validation loss = 0.04527973756194115
Validation loss = 0.04280531778931618
Validation loss = 0.04347440227866173
Validation loss = 0.04321086034178734
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 21
average number of affinization = 5.538461538461538
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 13
average number of affinization = 6.071428571428571
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 11
average number of affinization = 6.4
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 21
average number of affinization = 7.3125
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 56
average number of affinization = 10.176470588235293
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 16
average number of affinization = 10.5
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 331      |
| Iteration     | 1        |
| MaximumReturn | 333      |
| MinimumReturn | 325      |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04773283377289772
Validation loss = 0.033974237740039825
Validation loss = 0.03661651536822319
Validation loss = 0.033773962408304214
Validation loss = 0.03521124646067619
Validation loss = 0.03460418060421944
Validation loss = 0.033333659172058105
Validation loss = 0.034998875111341476
Validation loss = 0.03427553549408913
Validation loss = 0.03395188972353935
Validation loss = 0.032450590282678604
Validation loss = 0.03281507268548012
Validation loss = 0.03179479390382767
Validation loss = 0.031939420849084854
Validation loss = 0.034315723925828934
Validation loss = 0.03476594761013985
Validation loss = 0.03259968385100365
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05247882381081581
Validation loss = 0.03449149802327156
Validation loss = 0.033062148839235306
Validation loss = 0.03598163276910782
Validation loss = 0.03691500052809715
Validation loss = 0.03453177958726883
Validation loss = 0.03383779525756836
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04660232365131378
Validation loss = 0.037029411643743515
Validation loss = 0.0362800695002079
Validation loss = 0.035759128630161285
Validation loss = 0.03649253398180008
Validation loss = 0.0352771021425724
Validation loss = 0.03404812142252922
Validation loss = 0.03484969958662987
Validation loss = 0.03662431985139847
Validation loss = 0.035502173006534576
Validation loss = 0.03408660367131233
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.050279226154088974
Validation loss = 0.03615716099739075
Validation loss = 0.03456927835941315
Validation loss = 0.03430009260773659
Validation loss = 0.03658726066350937
Validation loss = 0.03329126164317131
Validation loss = 0.035087283700704575
Validation loss = 0.036305904388427734
Validation loss = 0.034170519560575485
Validation loss = 0.03353973105549812
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04489496722817421
Validation loss = 0.035965144634246826
Validation loss = 0.036089349538087845
Validation loss = 0.035872023552656174
Validation loss = 0.0341176800429821
Validation loss = 0.03346883878111839
Validation loss = 0.03300821781158447
Validation loss = 0.03651348128914833
Validation loss = 0.03443872928619385
Validation loss = 0.03445553407073021
Validation loss = 0.03358170762658119
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 0
average number of affinization = 9.947368421052632
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 27
average number of affinization = 10.8
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 32
average number of affinization = 11.80952380952381
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 2
average number of affinization = 11.363636363636363
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 70
average number of affinization = 13.91304347826087
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 30
average number of affinization = 14.583333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 326      |
| Iteration     | 2        |
| MaximumReturn | 337      |
| MinimumReturn | 319      |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03218550607562065
Validation loss = 0.02830825001001358
Validation loss = 0.03074885532259941
Validation loss = 0.030530020594596863
Validation loss = 0.027919571846723557
Validation loss = 0.030530136078596115
Validation loss = 0.03001401759684086
Validation loss = 0.02828621119260788
Validation loss = 0.02848295494914055
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.032835960388183594
Validation loss = 0.03027724102139473
Validation loss = 0.029486291110515594
Validation loss = 0.029224948957562447
Validation loss = 0.029535576701164246
Validation loss = 0.03183962404727936
Validation loss = 0.030655138194561005
Validation loss = 0.029851825907826424
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.033586494624614716
Validation loss = 0.03048817627131939
Validation loss = 0.031458985060453415
Validation loss = 0.02977612242102623
Validation loss = 0.0327916294336319
Validation loss = 0.032456837594509125
Validation loss = 0.03054952621459961
Validation loss = 0.030210502445697784
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.032200753688812256
Validation loss = 0.03110746294260025
Validation loss = 0.032580602914094925
Validation loss = 0.029861897230148315
Validation loss = 0.03156545013189316
Validation loss = 0.028274547308683395
Validation loss = 0.030123066157102585
Validation loss = 0.028618399053812027
Validation loss = 0.028772979974746704
Validation loss = 0.03132684528827667
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03391096740961075
Validation loss = 0.03372994810342789
Validation loss = 0.03029436618089676
Validation loss = 0.030274732038378716
Validation loss = 0.030923059210181236
Validation loss = 0.03056607022881508
Validation loss = 0.029433008283376694
Validation loss = 0.028132885694503784
Validation loss = 0.03008418157696724
Validation loss = 0.02934335730969906
Validation loss = 0.029124872758984566
Validation loss = 0.030231613665819168
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 58
average number of affinization = 16.32
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 72
average number of affinization = 18.46153846153846
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 18
average number of affinization = 18.444444444444443
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 23
average number of affinization = 18.607142857142858
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 77
average number of affinization = 20.620689655172413
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 40
average number of affinization = 21.266666666666666
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 328      |
| Iteration     | 3        |
| MaximumReturn | 330      |
| MinimumReturn | 325      |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.026306990534067154
Validation loss = 0.026030773296952248
Validation loss = 0.025372039526700974
Validation loss = 0.02515912987291813
Validation loss = 0.02463521808385849
Validation loss = 0.02506371960043907
Validation loss = 0.02592385746538639
Validation loss = 0.02748432755470276
Validation loss = 0.026142120361328125
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02754095196723938
Validation loss = 0.026300828903913498
Validation loss = 0.026889145374298096
Validation loss = 0.02779814973473549
Validation loss = 0.02568935789167881
Validation loss = 0.026097949594259262
Validation loss = 0.025775332003831863
Validation loss = 0.026950621977448463
Validation loss = 0.024543901905417442
Validation loss = 0.02572879195213318
Validation loss = 0.023744115605950356
Validation loss = 0.02601209282875061
Validation loss = 0.02635364606976509
Validation loss = 0.02523403987288475
Validation loss = 0.026563504710793495
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.028747886419296265
Validation loss = 0.027924785390496254
Validation loss = 0.028553694486618042
Validation loss = 0.02622002735733986
Validation loss = 0.02637861669063568
Validation loss = 0.028913933783769608
Validation loss = 0.026279890909790993
Validation loss = 0.029393205419182777
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.027438974007964134
Validation loss = 0.027227263897657394
Validation loss = 0.027609074488282204
Validation loss = 0.02606355771422386
Validation loss = 0.027181562036275864
Validation loss = 0.025778118520975113
Validation loss = 0.028330814093351364
Validation loss = 0.02505769208073616
Validation loss = 0.026628155261278152
Validation loss = 0.024600770324468613
Validation loss = 0.025731787085533142
Validation loss = 0.02626929245889187
Validation loss = 0.024845097213983536
Validation loss = 0.026576200500130653
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.029974419623613358
Validation loss = 0.027118414640426636
Validation loss = 0.029971670359373093
Validation loss = 0.025695687159895897
Validation loss = 0.024796564131975174
Validation loss = 0.024935489520430565
Validation loss = 0.025564957410097122
Validation loss = 0.02471425198018551
Validation loss = 0.02819789946079254
Validation loss = 0.02449944242835045
Validation loss = 0.024847371503710747
Validation loss = 0.024627726525068283
Validation loss = 0.02601037360727787
Validation loss = 0.025574633851647377
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 3
average number of affinization = 20.677419354838708
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 33
average number of affinization = 21.0625
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 45
average number of affinization = 21.78787878787879
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 10
average number of affinization = 21.441176470588236
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 39
average number of affinization = 21.942857142857143
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 66
average number of affinization = 23.166666666666668
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 317      |
| Iteration     | 4        |
| MaximumReturn | 323      |
| MinimumReturn | 308      |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02587500773370266
Validation loss = 0.023836977779865265
Validation loss = 0.022948535159230232
Validation loss = 0.022640584036707878
Validation loss = 0.025109022855758667
Validation loss = 0.025246554985642433
Validation loss = 0.021980660036206245
Validation loss = 0.024635130539536476
Validation loss = 0.022621149197220802
Validation loss = 0.023756427690386772
Validation loss = 0.02244812250137329
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02346235327422619
Validation loss = 0.022964609786868095
Validation loss = 0.023009980097413063
Validation loss = 0.022238418459892273
Validation loss = 0.023132985457777977
Validation loss = 0.023123590275645256
Validation loss = 0.022793864831328392
Validation loss = 0.02749335765838623
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.026710638776421547
Validation loss = 0.023469820618629456
Validation loss = 0.0255503598600626
Validation loss = 0.023739712312817574
Validation loss = 0.023193875327706337
Validation loss = 0.02244788594543934
Validation loss = 0.02278747223317623
Validation loss = 0.02339421957731247
Validation loss = 0.02370484173297882
Validation loss = 0.024338075891137123
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024868765845894814
Validation loss = 0.02255450189113617
Validation loss = 0.02394559048116207
Validation loss = 0.023506494238972664
Validation loss = 0.023550115525722504
Validation loss = 0.022809922695159912
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02456720359623432
Validation loss = 0.025420134887099266
Validation loss = 0.022013938054442406
Validation loss = 0.023018719628453255
Validation loss = 0.022302547469735146
Validation loss = 0.024134216830134392
Validation loss = 0.02294495515525341
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 52
average number of affinization = 23.945945945945947
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 4
average number of affinization = 23.42105263157895
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 8
average number of affinization = 23.025641025641026
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 71
average number of affinization = 24.225
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 25
average number of affinization = 24.24390243902439
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 57
average number of affinization = 25.023809523809526
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 327      |
| Iteration     | 5        |
| MaximumReturn | 329      |
| MinimumReturn | 325      |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022821182385087013
Validation loss = 0.021036727353930473
Validation loss = 0.02080996334552765
Validation loss = 0.021948812529444695
Validation loss = 0.02040349505841732
Validation loss = 0.021086977794766426
Validation loss = 0.020093489438295364
Validation loss = 0.02043769136071205
Validation loss = 0.02048448659479618
Validation loss = 0.021480074152350426
Validation loss = 0.02032412216067314
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021324070170521736
Validation loss = 0.020398830994963646
Validation loss = 0.020322443917393684
Validation loss = 0.020440053194761276
Validation loss = 0.020744768902659416
Validation loss = 0.020853674039244652
Validation loss = 0.021329164505004883
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022261198610067368
Validation loss = 0.02145835943520069
Validation loss = 0.022079521790146828
Validation loss = 0.021613216027617455
Validation loss = 0.022495869547128677
Validation loss = 0.021559925749897957
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02225874736905098
Validation loss = 0.020457565784454346
Validation loss = 0.020858049392700195
Validation loss = 0.021937182173132896
Validation loss = 0.021463504061102867
Validation loss = 0.02028081938624382
Validation loss = 0.021540243178606033
Validation loss = 0.020840849727392197
Validation loss = 0.02002221718430519
Validation loss = 0.020649472251534462
Validation loss = 0.02009359933435917
Validation loss = 0.02072477526962757
Validation loss = 0.022066760808229446
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021795213222503662
Validation loss = 0.020710764452815056
Validation loss = 0.021859971806406975
Validation loss = 0.021395964547991753
Validation loss = 0.020598918199539185
Validation loss = 0.020699402317404747
Validation loss = 0.021495414897799492
Validation loss = 0.02192910574376583
Validation loss = 0.021129250526428223
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 17
average number of affinization = 24.837209302325583
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 32
average number of affinization = 25.0
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 80
average number of affinization = 26.22222222222222
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 54
average number of affinization = 26.82608695652174
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 88
average number of affinization = 28.127659574468087
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 23
average number of affinization = 28.020833333333332
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 324      |
| Iteration     | 6        |
| MaximumReturn | 328      |
| MinimumReturn | 321      |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020279420539736748
Validation loss = 0.019189193844795227
Validation loss = 0.020848505198955536
Validation loss = 0.020219596102833748
Validation loss = 0.019870242103934288
Validation loss = 0.019590530544519424
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022055933251976967
Validation loss = 0.019994376227259636
Validation loss = 0.02000880241394043
Validation loss = 0.019748691469430923
Validation loss = 0.019135117530822754
Validation loss = 0.02105526253581047
Validation loss = 0.020304666832089424
Validation loss = 0.0192576814442873
Validation loss = 0.019957631826400757
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020204564556479454
Validation loss = 0.019810019060969353
Validation loss = 0.020650194957852364
Validation loss = 0.019708897918462753
Validation loss = 0.019855370745062828
Validation loss = 0.02052861638367176
Validation loss = 0.020652536302804947
Validation loss = 0.020537447184324265
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01973894238471985
Validation loss = 0.019482746720314026
Validation loss = 0.021206438541412354
Validation loss = 0.01931028440594673
Validation loss = 0.020879916846752167
Validation loss = 0.021130088716745377
Validation loss = 0.01873059757053852
Validation loss = 0.01998790353536606
Validation loss = 0.019602078944444656
Validation loss = 0.01905769482254982
Validation loss = 0.0195399709045887
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019776012748479843
Validation loss = 0.020099561661481857
Validation loss = 0.019641177728772163
Validation loss = 0.019314005970954895
Validation loss = 0.01933661475777626
Validation loss = 0.019423581659793854
Validation loss = 0.02013188973069191
Validation loss = 0.019872352480888367
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 34
average number of affinization = 28.142857142857142
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 24
average number of affinization = 28.06
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 15
average number of affinization = 27.80392156862745
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 19
average number of affinization = 27.634615384615383
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 32
average number of affinization = 27.71698113207547
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 28
average number of affinization = 27.72222222222222
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 321      |
| Iteration     | 7        |
| MaximumReturn | 326      |
| MinimumReturn | 316      |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0196798425167799
Validation loss = 0.018659185618162155
Validation loss = 0.018424782902002335
Validation loss = 0.018815096467733383
Validation loss = 0.01949990540742874
Validation loss = 0.019734276458621025
Validation loss = 0.018602939322590828
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01890644244849682
Validation loss = 0.020100746303796768
Validation loss = 0.019371001049876213
Validation loss = 0.020852137356996536
Validation loss = 0.01817573420703411
Validation loss = 0.019588574767112732
Validation loss = 0.01913364976644516
Validation loss = 0.019172420725226402
Validation loss = 0.019056348130106926
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01990458555519581
Validation loss = 0.019969776272773743
Validation loss = 0.018546191975474358
Validation loss = 0.019302764907479286
Validation loss = 0.019448963925242424
Validation loss = 0.0192226842045784
Validation loss = 0.019490044564008713
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01840774342417717
Validation loss = 0.01887718215584755
Validation loss = 0.018592875450849533
Validation loss = 0.019210495054721832
Validation loss = 0.01810844987630844
Validation loss = 0.018709635362029076
Validation loss = 0.019893977791070938
Validation loss = 0.019066577777266502
Validation loss = 0.018498079851269722
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01866614632308483
Validation loss = 0.019785411655902863
Validation loss = 0.019802678376436234
Validation loss = 0.019498499110341072
Validation loss = 0.01829187199473381
Validation loss = 0.01819315366446972
Validation loss = 0.02190518006682396
Validation loss = 0.01791214756667614
Validation loss = 0.01860806718468666
Validation loss = 0.018930591642856598
Validation loss = 0.019765937700867653
Validation loss = 0.01812220923602581
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 45
average number of affinization = 28.036363636363635
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 94
average number of affinization = 29.214285714285715
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 23
average number of affinization = 29.105263157894736
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 71
average number of affinization = 29.82758620689655
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 53
average number of affinization = 30.220338983050848
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 98
average number of affinization = 31.35
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 310      |
| Iteration     | 8        |
| MaximumReturn | 314      |
| MinimumReturn | 306      |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01854667253792286
Validation loss = 0.017662666738033295
Validation loss = 0.017621394246816635
Validation loss = 0.018204588443040848
Validation loss = 0.018423160538077354
Validation loss = 0.02095230296254158
Validation loss = 0.018832743167877197
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017627185210585594
Validation loss = 0.018511105328798294
Validation loss = 0.017734702676534653
Validation loss = 0.018188271671533585
Validation loss = 0.017971130087971687
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018507976084947586
Validation loss = 0.019646290689706802
Validation loss = 0.018906496465206146
Validation loss = 0.01878443732857704
Validation loss = 0.018542740494012833
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01768435910344124
Validation loss = 0.017819149419665337
Validation loss = 0.01790315844118595
Validation loss = 0.018709272146224976
Validation loss = 0.02015291526913643
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017558680847287178
Validation loss = 0.01897801272571087
Validation loss = 0.01807982847094536
Validation loss = 0.01810486428439617
Validation loss = 0.018305381760001183
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 48
average number of affinization = 31.62295081967213
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 51
average number of affinization = 31.93548387096774
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 20
average number of affinization = 31.746031746031747
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 47
average number of affinization = 31.984375
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 29
average number of affinization = 31.93846153846154
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 32
average number of affinization = 31.939393939393938
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 306      |
| Iteration     | 9        |
| MaximumReturn | 310      |
| MinimumReturn | 302      |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01883871667087078
Validation loss = 0.01775219477713108
Validation loss = 0.017492808401584625
Validation loss = 0.017741991207003593
Validation loss = 0.016831740736961365
Validation loss = 0.017941245809197426
Validation loss = 0.017629997804760933
Validation loss = 0.01757199876010418
Validation loss = 0.016965867951512337
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018406735733151436
Validation loss = 0.019092299044132233
Validation loss = 0.018108107149600983
Validation loss = 0.018353091552853584
Validation loss = 0.017522208392620087
Validation loss = 0.017701387405395508
Validation loss = 0.017249219119548798
Validation loss = 0.019634462893009186
Validation loss = 0.016969239339232445
Validation loss = 0.017524711787700653
Validation loss = 0.016999224200844765
Validation loss = 0.016929348930716515
Validation loss = 0.01748577132821083
Validation loss = 0.018031595274806023
Validation loss = 0.017529265955090523
Validation loss = 0.016933469101786613
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01754187047481537
Validation loss = 0.017358360812067986
Validation loss = 0.017553124576807022
Validation loss = 0.018666839227080345
Validation loss = 0.01806306652724743
Validation loss = 0.01769126206636429
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017703937366604805
Validation loss = 0.017231248319149017
Validation loss = 0.018170073628425598
Validation loss = 0.01807325892150402
Validation loss = 0.016556551679968834
Validation loss = 0.017194345593452454
Validation loss = 0.017495302483439445
Validation loss = 0.01862270198762417
Validation loss = 0.0174899660050869
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017126046121120453
Validation loss = 0.018415002152323723
Validation loss = 0.017908236011862755
Validation loss = 0.01972386986017227
Validation loss = 0.01741577498614788
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 31
average number of affinization = 31.925373134328357
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 2
average number of affinization = 31.485294117647058
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 48
average number of affinization = 31.72463768115942
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 1
average number of affinization = 31.285714285714285
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 32
average number of affinization = 31.295774647887324
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 25
average number of affinization = 31.208333333333332
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 330      |
| Iteration     | 10       |
| MaximumReturn | 335      |
| MinimumReturn | 325      |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018287746235728264
Validation loss = 0.017105408012866974
Validation loss = 0.01728779450058937
Validation loss = 0.016756968572735786
Validation loss = 0.01802455075085163
Validation loss = 0.01810326613485813
Validation loss = 0.01785765029489994
Validation loss = 0.017009379342198372
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017392432317137718
Validation loss = 0.01655840128660202
Validation loss = 0.017395855858922005
Validation loss = 0.01662571355700493
Validation loss = 0.01773192174732685
Validation loss = 0.01616423763334751
Validation loss = 0.016805525869131088
Validation loss = 0.017027055844664574
Validation loss = 0.016929451376199722
Validation loss = 0.01729814149439335
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017923595383763313
Validation loss = 0.01962900348007679
Validation loss = 0.018171032890677452
Validation loss = 0.016917813569307327
Validation loss = 0.01738680712878704
Validation loss = 0.016684098169207573
Validation loss = 0.018007345497608185
Validation loss = 0.017424825578927994
Validation loss = 0.018270550295710564
Validation loss = 0.017046354711055756
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017509376630187035
Validation loss = 0.017024941742420197
Validation loss = 0.01740618608891964
Validation loss = 0.01685340143740177
Validation loss = 0.017307674512267113
Validation loss = 0.016507379710674286
Validation loss = 0.017007313668727875
Validation loss = 0.017064662650227547
Validation loss = 0.01732037216424942
Validation loss = 0.01690857857465744
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01697825826704502
Validation loss = 0.01847832277417183
Validation loss = 0.01803733967244625
Validation loss = 0.017683887854218483
Validation loss = 0.016742901876568794
Validation loss = 0.01734169013798237
Validation loss = 0.016849299892783165
Validation loss = 0.018072014674544334
Validation loss = 0.018125519156455994
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 64
average number of affinization = 31.65753424657534
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 85
average number of affinization = 32.37837837837838
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 70
average number of affinization = 32.88
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 48
average number of affinization = 33.078947368421055
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 80
average number of affinization = 33.688311688311686
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 49
average number of affinization = 33.88461538461539
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 323      |
| Iteration     | 11       |
| MaximumReturn | 326      |
| MinimumReturn | 319      |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016504855826497078
Validation loss = 0.017667334526777267
Validation loss = 0.016769634559750557
Validation loss = 0.017111627385020256
Validation loss = 0.017322059720754623
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016815854236483574
Validation loss = 0.017166851088404655
Validation loss = 0.01651860401034355
Validation loss = 0.017043333500623703
Validation loss = 0.01642727479338646
Validation loss = 0.017164966091513634
Validation loss = 0.016869647428393364
Validation loss = 0.016589248552918434
Validation loss = 0.016513612121343613
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017120327800512314
Validation loss = 0.016932008787989616
Validation loss = 0.017124857753515244
Validation loss = 0.01725732907652855
Validation loss = 0.016450129449367523
Validation loss = 0.017440030351281166
Validation loss = 0.01705854944884777
Validation loss = 0.016645262017846107
Validation loss = 0.017221879214048386
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01777597889304161
Validation loss = 0.017013655975461006
Validation loss = 0.01672815904021263
Validation loss = 0.017108498141169548
Validation loss = 0.01602049730718136
Validation loss = 0.016816027462482452
Validation loss = 0.017102375626564026
Validation loss = 0.016528980806469917
Validation loss = 0.017255982384085655
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016850383952260017
Validation loss = 0.016816483810544014
Validation loss = 0.0169121902436018
Validation loss = 0.01856115832924843
Validation loss = 0.017012955620884895
Validation loss = 0.017481371760368347
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 91
average number of affinization = 34.607594936708864
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 83
average number of affinization = 35.2125
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 71
average number of affinization = 35.65432098765432
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 90
average number of affinization = 36.31707317073171
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 92
average number of affinization = 36.98795180722892
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 90
average number of affinization = 37.61904761904762
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 322      |
| Iteration     | 12       |
| MaximumReturn | 325      |
| MinimumReturn | 318      |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017053674906492233
Validation loss = 0.01666221395134926
Validation loss = 0.01665215566754341
Validation loss = 0.01669401116669178
Validation loss = 0.0166626013815403
Validation loss = 0.016793306916952133
Validation loss = 0.016860663890838623
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01787162758409977
Validation loss = 0.0169729832559824
Validation loss = 0.0161945391446352
Validation loss = 0.016709374263882637
Validation loss = 0.016290079802274704
Validation loss = 0.01632157526910305
Validation loss = 0.01636006124317646
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0170066487044096
Validation loss = 0.016482118517160416
Validation loss = 0.018060799688100815
Validation loss = 0.01646614447236061
Validation loss = 0.017147457227110863
Validation loss = 0.0158526711165905
Validation loss = 0.0160639900714159
Validation loss = 0.01609288714826107
Validation loss = 0.016954820603132248
Validation loss = 0.01643877848982811
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016342494636774063
Validation loss = 0.017011838033795357
Validation loss = 0.016261253505945206
Validation loss = 0.016469398513436317
Validation loss = 0.018412932753562927
Validation loss = 0.016420451924204826
Validation loss = 0.016625670716166496
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016282809898257256
Validation loss = 0.016891535371541977
Validation loss = 0.01661515422165394
Validation loss = 0.01668933965265751
Validation loss = 0.01785580813884735
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 105
average number of affinization = 38.411764705882355
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 89
average number of affinization = 39.0
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 97
average number of affinization = 39.666666666666664
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 74
average number of affinization = 40.05681818181818
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 59
average number of affinization = 40.26966292134831
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 100
average number of affinization = 40.93333333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 324      |
| Iteration     | 13       |
| MaximumReturn | 326      |
| MinimumReturn | 320      |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01612081006169319
Validation loss = 0.01622157357633114
Validation loss = 0.016260799020528793
Validation loss = 0.018894342705607414
Validation loss = 0.016205066815018654
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01603321172297001
Validation loss = 0.016076939180493355
Validation loss = 0.016623137518763542
Validation loss = 0.01727375015616417
Validation loss = 0.016519881784915924
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015834413468837738
Validation loss = 0.0170101597905159
Validation loss = 0.018453428521752357
Validation loss = 0.016639254987239838
Validation loss = 0.0165383443236351
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016507763415575027
Validation loss = 0.016650600358843803
Validation loss = 0.017172126099467278
Validation loss = 0.017634831368923187
Validation loss = 0.017347795888781548
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016436973586678505
Validation loss = 0.016772670671343803
Validation loss = 0.016517825424671173
Validation loss = 0.016292409971356392
Validation loss = 0.01650524139404297
Validation loss = 0.016247263178229332
Validation loss = 0.01584942452609539
Validation loss = 0.016778824850916862
Validation loss = 0.016222510486841202
Validation loss = 0.016227127984166145
Validation loss = 0.016924472525715828
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 102
average number of affinization = 41.604395604395606
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 79
average number of affinization = 42.01086956521739
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 87
average number of affinization = 42.494623655913976
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 105
average number of affinization = 43.159574468085104
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 91
average number of affinization = 43.66315789473684
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 37
average number of affinization = 43.59375
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 319      |
| Iteration     | 14       |
| MaximumReturn | 322      |
| MinimumReturn | 312      |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01653420552611351
Validation loss = 0.01617034338414669
Validation loss = 0.017227118834853172
Validation loss = 0.015825696289539337
Validation loss = 0.016046274453401566
Validation loss = 0.01662534847855568
Validation loss = 0.01604687236249447
Validation loss = 0.01617537811398506
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01628910005092621
Validation loss = 0.016131684184074402
Validation loss = 0.01589970663189888
Validation loss = 0.016067704185843468
Validation loss = 0.015594477765262127
Validation loss = 0.016026634722948074
Validation loss = 0.015890222042798996
Validation loss = 0.015880320221185684
Validation loss = 0.016565676778554916
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016418088227510452
Validation loss = 0.017291009426116943
Validation loss = 0.016556426882743835
Validation loss = 0.01577594503760338
Validation loss = 0.016787072643637657
Validation loss = 0.016149498522281647
Validation loss = 0.015981525182724
Validation loss = 0.016508398577570915
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0168018639087677
Validation loss = 0.016439542174339294
Validation loss = 0.018060825765132904
Validation loss = 0.016141168773174286
Validation loss = 0.016779853031039238
Validation loss = 0.016836058348417282
Validation loss = 0.016087155789136887
Validation loss = 0.01610282063484192
Validation loss = 0.016789138317108154
Validation loss = 0.0167998019605875
Validation loss = 0.016063539311289787
Validation loss = 0.016507763415575027
Validation loss = 0.01610986515879631
Validation loss = 0.016141781583428383
Validation loss = 0.016316745430231094
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016738079488277435
Validation loss = 0.016610343009233475
Validation loss = 0.015877477824687958
Validation loss = 0.01626496948301792
Validation loss = 0.01601419411599636
Validation loss = 0.01631835848093033
Validation loss = 0.016225483268499374
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 34
average number of affinization = 43.49484536082474
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 80
average number of affinization = 43.86734693877551
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 28
average number of affinization = 43.707070707070706
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 116
average number of affinization = 44.43
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 91
average number of affinization = 44.89108910891089
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 104
average number of affinization = 45.470588235294116
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 321      |
| Iteration     | 15       |
| MaximumReturn | 327      |
| MinimumReturn | 315      |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017571842297911644
Validation loss = 0.015644080936908722
Validation loss = 0.01596481166779995
Validation loss = 0.01575709879398346
Validation loss = 0.015424873679876328
Validation loss = 0.01590447872877121
Validation loss = 0.016169795766472816
Validation loss = 0.01575626991689205
Validation loss = 0.016113746911287308
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015403090044856071
Validation loss = 0.0159154050052166
Validation loss = 0.015827519819140434
Validation loss = 0.01573360525071621
Validation loss = 0.016308102756738663
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016361771151423454
Validation loss = 0.01686730422079563
Validation loss = 0.01605716161429882
Validation loss = 0.015578261576592922
Validation loss = 0.015680458396673203
Validation loss = 0.015985002741217613
Validation loss = 0.01542521920055151
Validation loss = 0.016901938244700432
Validation loss = 0.015882035717368126
Validation loss = 0.015672698616981506
Validation loss = 0.016486505046486855
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015745708718895912
Validation loss = 0.016050565987825394
Validation loss = 0.015930313616991043
Validation loss = 0.01562422327697277
Validation loss = 0.01613731123507023
Validation loss = 0.015911336988210678
Validation loss = 0.01642420142889023
Validation loss = 0.016018498688936234
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015494068153202534
Validation loss = 0.015881584957242012
Validation loss = 0.016120854765176773
Validation loss = 0.015385904349386692
Validation loss = 0.016429349780082703
Validation loss = 0.015992099419236183
Validation loss = 0.016484586521983147
Validation loss = 0.015819702297449112
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 123
average number of affinization = 46.22330097087379
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 116
average number of affinization = 46.89423076923077
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 125
average number of affinization = 47.63809523809524
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 114
average number of affinization = 48.264150943396224
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 116
average number of affinization = 48.89719626168224
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 112
average number of affinization = 49.48148148148148
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 317      |
| Iteration     | 16       |
| MaximumReturn | 320      |
| MinimumReturn | 312      |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016016371548175812
Validation loss = 0.015596438199281693
Validation loss = 0.01562543399631977
Validation loss = 0.015249120071530342
Validation loss = 0.01526997797191143
Validation loss = 0.015353888273239136
Validation loss = 0.015828721225261688
Validation loss = 0.016036242246627808
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017884090542793274
Validation loss = 0.015324375592172146
Validation loss = 0.015557666309177876
Validation loss = 0.015599105507135391
Validation loss = 0.015410282649099827
Validation loss = 0.016502631828188896
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015510659664869308
Validation loss = 0.015909506008028984
Validation loss = 0.015955792739987373
Validation loss = 0.01601390726864338
Validation loss = 0.01571412943303585
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015463504940271378
Validation loss = 0.016214724630117416
Validation loss = 0.016181958839297295
Validation loss = 0.01574794575572014
Validation loss = 0.015801535919308662
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015216552652418613
Validation loss = 0.015557133592665195
Validation loss = 0.015449371188879013
Validation loss = 0.016024969518184662
Validation loss = 0.01554421242326498
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 88
average number of affinization = 49.8348623853211
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 110
average number of affinization = 50.38181818181818
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 111
average number of affinization = 50.927927927927925
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 118
average number of affinization = 51.526785714285715
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 110
average number of affinization = 52.04424778761062
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 82
average number of affinization = 52.30701754385965
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 324      |
| Iteration     | 17       |
| MaximumReturn | 329      |
| MinimumReturn | 320      |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015460249967873096
Validation loss = 0.01585065945982933
Validation loss = 0.015390916727483273
Validation loss = 0.015389804728329182
Validation loss = 0.015115791000425816
Validation loss = 0.015876632183790207
Validation loss = 0.015395421534776688
Validation loss = 0.01544293574988842
Validation loss = 0.015969090163707733
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015369790606200695
Validation loss = 0.015229570679366589
Validation loss = 0.015557275153696537
Validation loss = 0.015086169354617596
Validation loss = 0.01551232673227787
Validation loss = 0.015650393441319466
Validation loss = 0.015178026631474495
Validation loss = 0.0155943613499403
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01518948469310999
Validation loss = 0.015648873522877693
Validation loss = 0.015095330774784088
Validation loss = 0.0152354771271348
Validation loss = 0.01534764189273119
Validation loss = 0.015150019899010658
Validation loss = 0.01517504919320345
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015561649575829506
Validation loss = 0.015513119287788868
Validation loss = 0.015335766598582268
Validation loss = 0.016266662627458572
Validation loss = 0.015542559325695038
Validation loss = 0.01598268747329712
Validation loss = 0.015504213981330395
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015264462679624557
Validation loss = 0.015118691138923168
Validation loss = 0.0154061084613204
Validation loss = 0.015175379812717438
Validation loss = 0.016017543151974678
Validation loss = 0.015283181332051754
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 109
average number of affinization = 52.8
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 118
average number of affinization = 53.36206896551724
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 119
average number of affinization = 53.92307692307692
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 117
average number of affinization = 54.45762711864407
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 118
average number of affinization = 54.99159663865546
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 116
average number of affinization = 55.5
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 322      |
| Iteration     | 18       |
| MaximumReturn | 325      |
| MinimumReturn | 318      |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0152435302734375
Validation loss = 0.01576656475663185
Validation loss = 0.015216732397675514
Validation loss = 0.015136932022869587
Validation loss = 0.015237553045153618
Validation loss = 0.015284773893654346
Validation loss = 0.015595436096191406
Validation loss = 0.015509137883782387
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014846387319266796
Validation loss = 0.015663430094718933
Validation loss = 0.015058064833283424
Validation loss = 0.015205666422843933
Validation loss = 0.015092888846993446
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015061622485518456
Validation loss = 0.015041249804198742
Validation loss = 0.015445766970515251
Validation loss = 0.015682781115174294
Validation loss = 0.01472512912005186
Validation loss = 0.015031722374260426
Validation loss = 0.014993119053542614
Validation loss = 0.014831522479653358
Validation loss = 0.015403774566948414
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015357295982539654
Validation loss = 0.015248964540660381
Validation loss = 0.01559423841536045
Validation loss = 0.015392470173537731
Validation loss = 0.015049070119857788
Validation loss = 0.01483821589499712
Validation loss = 0.0156550370156765
Validation loss = 0.015733463689684868
Validation loss = 0.015769118443131447
Validation loss = 0.015252913348376751
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015432646498084068
Validation loss = 0.015221893787384033
Validation loss = 0.015378907322883606
Validation loss = 0.015020444989204407
Validation loss = 0.015352390706539154
Validation loss = 0.015203332528471947
Validation loss = 0.015079684555530548
Validation loss = 0.0151511924341321
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 101
average number of affinization = 55.87603305785124
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 129
average number of affinization = 56.47540983606557
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 114
average number of affinization = 56.94308943089431
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 120
average number of affinization = 57.45161290322581
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 101
average number of affinization = 57.8
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 101
average number of affinization = 58.142857142857146
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 323      |
| Iteration     | 19       |
| MaximumReturn | 326      |
| MinimumReturn | 321      |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014718754217028618
Validation loss = 0.015060446225106716
Validation loss = 0.015566695481538773
Validation loss = 0.014976940117776394
Validation loss = 0.015352530404925346
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015154421329498291
Validation loss = 0.015222536399960518
Validation loss = 0.015092762187123299
Validation loss = 0.015302948653697968
Validation loss = 0.014745118096470833
Validation loss = 0.01477424893528223
Validation loss = 0.015173099003732204
Validation loss = 0.014805465936660767
Validation loss = 0.014938988722860813
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014877979643642902
Validation loss = 0.01493254117667675
Validation loss = 0.015371304005384445
Validation loss = 0.015160959213972092
Validation loss = 0.015357638709247112
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014691648073494434
Validation loss = 0.015301038511097431
Validation loss = 0.015642333775758743
Validation loss = 0.0150811318308115
Validation loss = 0.01511920802295208
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014862379990518093
Validation loss = 0.014667259529232979
Validation loss = 0.014672691933810711
Validation loss = 0.014649001881480217
Validation loss = 0.014647047966718674
Validation loss = 0.015283023938536644
Validation loss = 0.015058539807796478
Validation loss = 0.01461037714034319
Validation loss = 0.014986924827098846
Validation loss = 0.01470071543008089
Validation loss = 0.015006612055003643
Validation loss = 0.014823785983026028
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 139
average number of affinization = 58.77952755905512
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 125
average number of affinization = 59.296875
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 128
average number of affinization = 59.82945736434109
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 107
average number of affinization = 60.19230769230769
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 111
average number of affinization = 60.58015267175573
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 131
average number of affinization = 61.11363636363637
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 323      |
| Iteration     | 20       |
| MaximumReturn | 326      |
| MinimumReturn | 320      |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015299470163881779
Validation loss = 0.014808934181928635
Validation loss = 0.015432959422469139
Validation loss = 0.015569799579679966
Validation loss = 0.01484120823442936
Validation loss = 0.015180033631622791
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015390977263450623
Validation loss = 0.01525570172816515
Validation loss = 0.015743602067232132
Validation loss = 0.015207970514893532
Validation loss = 0.015359262004494667
Validation loss = 0.016394315287470818
Validation loss = 0.01550519373267889
Validation loss = 0.01521589420735836
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014969319105148315
Validation loss = 0.015556910075247288
Validation loss = 0.014825146645307541
Validation loss = 0.015010997653007507
Validation loss = 0.014877588488161564
Validation loss = 0.015055364929139614
Validation loss = 0.014900273643434048
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014945817179977894
Validation loss = 0.015932153910398483
Validation loss = 0.015197069384157658
Validation loss = 0.015241739340126514
Validation loss = 0.01533367857336998
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014912620186805725
Validation loss = 0.015195640735328197
Validation loss = 0.015339359641075134
Validation loss = 0.014691823162138462
Validation loss = 0.015256150625646114
Validation loss = 0.01481314841657877
Validation loss = 0.014823704957962036
Validation loss = 0.01499538216739893
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 119
average number of affinization = 61.54887218045113
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 107
average number of affinization = 61.88805970149254
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 129
average number of affinization = 62.385185185185186
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 134
average number of affinization = 62.911764705882355
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 132
average number of affinization = 63.416058394160586
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 123
average number of affinization = 63.84782608695652
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 321      |
| Iteration     | 21       |
| MaximumReturn | 324      |
| MinimumReturn | 318      |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015210532583296299
Validation loss = 0.015332398004829884
Validation loss = 0.014690260402858257
Validation loss = 0.015151274390518665
Validation loss = 0.015522061847150326
Validation loss = 0.015342986211180687
Validation loss = 0.014736464247107506
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015035059303045273
Validation loss = 0.0148087739944458
Validation loss = 0.014855192974209785
Validation loss = 0.014653175137937069
Validation loss = 0.014854232780635357
Validation loss = 0.01561343390494585
Validation loss = 0.015394716523587704
Validation loss = 0.014667057432234287
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0146894920617342
Validation loss = 0.014630098827183247
Validation loss = 0.014946597628295422
Validation loss = 0.01524568535387516
Validation loss = 0.01511188130825758
Validation loss = 0.014819290488958359
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014844906516373158
Validation loss = 0.014981700107455254
Validation loss = 0.015299240127205849
Validation loss = 0.01640387810766697
Validation loss = 0.015326147899031639
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01581754721701145
Validation loss = 0.014772675931453705
Validation loss = 0.015101714059710503
Validation loss = 0.01530364714562893
Validation loss = 0.015005427412688732
Validation loss = 0.01504665520042181
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 132
average number of affinization = 64.33812949640287
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 138
average number of affinization = 64.86428571428571
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 127
average number of affinization = 65.30496453900709
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 114
average number of affinization = 65.64788732394366
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 126
average number of affinization = 66.06993006993007
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 105
average number of affinization = 66.34027777777777
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 319      |
| Iteration     | 22       |
| MaximumReturn | 322      |
| MinimumReturn | 316      |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015455380082130432
Validation loss = 0.015156351029872894
Validation loss = 0.015320919454097748
Validation loss = 0.014928636141121387
Validation loss = 0.015442621894180775
Validation loss = 0.015056408010423183
Validation loss = 0.014822184108197689
Validation loss = 0.015388893894851208
Validation loss = 0.0149276377633214
Validation loss = 0.014996479265391827
Validation loss = 0.014843195676803589
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014953953213989735
Validation loss = 0.015114597976207733
Validation loss = 0.015069037675857544
Validation loss = 0.015536333434283733
Validation loss = 0.014748086221516132
Validation loss = 0.016091689467430115
Validation loss = 0.015053329057991505
Validation loss = 0.016032274812459946
Validation loss = 0.015725595876574516
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014735140837728977
Validation loss = 0.014838491566479206
Validation loss = 0.014810900203883648
Validation loss = 0.015163893811404705
Validation loss = 0.015233110636472702
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014969240874052048
Validation loss = 0.015462356619536877
Validation loss = 0.015661688521504402
Validation loss = 0.015285607427358627
Validation loss = 0.015183933079242706
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014946858398616314
Validation loss = 0.015061439014971256
Validation loss = 0.01482382696121931
Validation loss = 0.015394789166748524
Validation loss = 0.01485367026180029
Validation loss = 0.014857791364192963
Validation loss = 0.014897778630256653
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 120
average number of affinization = 66.71034482758621
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 95
average number of affinization = 66.9041095890411
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 74
average number of affinization = 66.95238095238095
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 135
average number of affinization = 67.41216216216216
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 102
average number of affinization = 67.64429530201342
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 115
average number of affinization = 67.96
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 325      |
| Iteration     | 23       |
| MaximumReturn | 327      |
| MinimumReturn | 322      |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015182767994701862
Validation loss = 0.015392097644507885
Validation loss = 0.015984471887350082
Validation loss = 0.015363066457211971
Validation loss = 0.015164639800786972
Validation loss = 0.015027866698801517
Validation loss = 0.015299344435334206
Validation loss = 0.015574969351291656
Validation loss = 0.015472318977117538
Validation loss = 0.015233687125146389
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015622038394212723
Validation loss = 0.015052314847707748
Validation loss = 0.014980459585785866
Validation loss = 0.01570640504360199
Validation loss = 0.015354232862591743
Validation loss = 0.015528550371527672
Validation loss = 0.01515677198767662
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015498042106628418
Validation loss = 0.015370036475360394
Validation loss = 0.015086335130035877
Validation loss = 0.014910583384335041
Validation loss = 0.015165749937295914
Validation loss = 0.01514098048210144
Validation loss = 0.01526689063757658
Validation loss = 0.014992469921708107
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01606077142059803
Validation loss = 0.01554865576326847
Validation loss = 0.01576818898320198
Validation loss = 0.015150143764913082
Validation loss = 0.01550240907818079
Validation loss = 0.015350205823779106
Validation loss = 0.01551284920424223
Validation loss = 0.015351036563515663
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01565627194941044
Validation loss = 0.015266470611095428
Validation loss = 0.014928456395864487
Validation loss = 0.015432063490152359
Validation loss = 0.015192785300314426
Validation loss = 0.015463164076209068
Validation loss = 0.015320682898163795
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 131
average number of affinization = 68.37748344370861
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 126
average number of affinization = 68.75657894736842
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 130
average number of affinization = 69.15686274509804
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 114
average number of affinization = 69.44805194805195
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 128
average number of affinization = 69.8258064516129
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 113
average number of affinization = 70.1025641025641
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 321      |
| Iteration     | 24       |
| MaximumReturn | 323      |
| MinimumReturn | 318      |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015089267864823341
Validation loss = 0.015789885073900223
Validation loss = 0.015031184069812298
Validation loss = 0.015293214470148087
Validation loss = 0.015101844444870949
Validation loss = 0.015396212227642536
Validation loss = 0.015176048502326012
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015001768246293068
Validation loss = 0.015341665595769882
Validation loss = 0.015321869403123856
Validation loss = 0.015360984951257706
Validation loss = 0.01574729196727276
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015186474658548832
Validation loss = 0.015248476527631283
Validation loss = 0.015407800674438477
Validation loss = 0.015114577487111092
Validation loss = 0.015282160602509975
Validation loss = 0.015042305923998356
Validation loss = 0.01483807060867548
Validation loss = 0.014988048002123833
Validation loss = 0.015027512796223164
Validation loss = 0.015001588501036167
Validation loss = 0.015349329449236393
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01518755592405796
Validation loss = 0.015821639448404312
Validation loss = 0.015302378684282303
Validation loss = 0.015405487269163132
Validation loss = 0.01556417252868414
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015154405497014523
Validation loss = 0.01494080200791359
Validation loss = 0.015270844101905823
Validation loss = 0.014858400449156761
Validation loss = 0.01529599353671074
Validation loss = 0.015070110559463501
Validation loss = 0.015162074938416481
Validation loss = 0.014977483078837395
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 106
average number of affinization = 70.3312101910828
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 137
average number of affinization = 70.75316455696202
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 114
average number of affinization = 71.0251572327044
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 140
average number of affinization = 71.45625
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 112
average number of affinization = 71.7080745341615
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 145
average number of affinization = 72.1604938271605
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 322      |
| Iteration     | 25       |
| MaximumReturn | 325      |
| MinimumReturn | 320      |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01532147079706192
Validation loss = 0.015033520758152008
Validation loss = 0.015218986198306084
Validation loss = 0.015561167150735855
Validation loss = 0.01524320337921381
Validation loss = 0.01525320578366518
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015370100736618042
Validation loss = 0.015043961815536022
Validation loss = 0.014872363768517971
Validation loss = 0.015190627425909042
Validation loss = 0.015089364722371101
Validation loss = 0.015306633897125721
Validation loss = 0.015084413811564445
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015156281180679798
Validation loss = 0.015335138887166977
Validation loss = 0.015383295714855194
Validation loss = 0.015287511982023716
Validation loss = 0.015508457086980343
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0158425010740757
Validation loss = 0.015252377837896347
Validation loss = 0.015171749517321587
Validation loss = 0.015345613472163677
Validation loss = 0.015435487031936646
Validation loss = 0.015675419941544533
Validation loss = 0.015650464221835136
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014910465106368065
Validation loss = 0.015188552439212799
Validation loss = 0.015305428765714169
Validation loss = 0.015098105184733868
Validation loss = 0.015411601401865482
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 110
average number of affinization = 72.39263803680981
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 123
average number of affinization = 72.70121951219512
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 125
average number of affinization = 73.01818181818182
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 137
average number of affinization = 73.40361445783132
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 124
average number of affinization = 73.7065868263473
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 115
average number of affinization = 73.95238095238095
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 324      |
| Iteration     | 26       |
| MaximumReturn | 326      |
| MinimumReturn | 320      |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015168893150985241
Validation loss = 0.015109742991626263
Validation loss = 0.015246836468577385
Validation loss = 0.015528934076428413
Validation loss = 0.01538491528481245
Validation loss = 0.015470460057258606
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015011770650744438
Validation loss = 0.01516584400087595
Validation loss = 0.014955609105527401
Validation loss = 0.015116461552679539
Validation loss = 0.014941907487809658
Validation loss = 0.015211565420031548
Validation loss = 0.015515729784965515
Validation loss = 0.015044807456433773
Validation loss = 0.014910037629306316
Validation loss = 0.01537961233407259
Validation loss = 0.015029468573629856
Validation loss = 0.015089849941432476
Validation loss = 0.015094071626663208
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01579420268535614
Validation loss = 0.015045827254652977
Validation loss = 0.014954441227018833
Validation loss = 0.015229972079396248
Validation loss = 0.015007623471319675
Validation loss = 0.015223701484501362
Validation loss = 0.015162757597863674
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015052116475999355
Validation loss = 0.015154398046433926
Validation loss = 0.016121599823236465
Validation loss = 0.015079803764820099
Validation loss = 0.015624336898326874
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015277551487088203
Validation loss = 0.01484060101211071
Validation loss = 0.015423224307596684
Validation loss = 0.015271917916834354
Validation loss = 0.014842664822936058
Validation loss = 0.015248745679855347
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 126
average number of affinization = 74.2603550295858
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 128
average number of affinization = 74.5764705882353
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 122
average number of affinization = 74.85380116959064
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 132
average number of affinization = 75.18604651162791
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 116
average number of affinization = 75.42196531791907
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 120
average number of affinization = 75.67816091954023
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 321      |
| Iteration     | 27       |
| MaximumReturn | 325      |
| MinimumReturn | 319      |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015121778473258018
Validation loss = 0.015139544382691383
Validation loss = 0.015319556929171085
Validation loss = 0.015009790658950806
Validation loss = 0.014753056690096855
Validation loss = 0.01514488086104393
Validation loss = 0.014886688441038132
Validation loss = 0.015131926164031029
Validation loss = 0.016232363879680634
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01503697782754898
Validation loss = 0.01526740100234747
Validation loss = 0.01524857897311449
Validation loss = 0.015377532690763474
Validation loss = 0.015079505741596222
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014989541843533516
Validation loss = 0.015168789774179459
Validation loss = 0.015265468508005142
Validation loss = 0.015040485188364983
Validation loss = 0.014754693023860455
Validation loss = 0.015095380134880543
Validation loss = 0.014692134223878384
Validation loss = 0.015385741367936134
Validation loss = 0.0150068998336792
Validation loss = 0.015140049159526825
Validation loss = 0.015044746920466423
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015048607252538204
Validation loss = 0.014851400628685951
Validation loss = 0.015155735425651073
Validation loss = 0.01586618274450302
Validation loss = 0.015054513700306416
Validation loss = 0.015132718719542027
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01514885388314724
Validation loss = 0.014912907034158707
Validation loss = 0.015004708431661129
Validation loss = 0.015076338313519955
Validation loss = 0.01489679329097271
Validation loss = 0.015132959932088852
Validation loss = 0.015100426971912384
Validation loss = 0.015113781206309795
Validation loss = 0.015130273066461086
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 93
average number of affinization = 75.77714285714286
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 107
average number of affinization = 75.95454545454545
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 131
average number of affinization = 76.26553672316385
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 107
average number of affinization = 76.43820224719101
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 122
average number of affinization = 76.6927374301676
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 112
average number of affinization = 76.88888888888889
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 319      |
| Iteration     | 28       |
| MaximumReturn | 322      |
| MinimumReturn | 316      |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015137727372348309
Validation loss = 0.015781309455633163
Validation loss = 0.015106197446584702
Validation loss = 0.015271587297320366
Validation loss = 0.01512958575040102
Validation loss = 0.01520131342113018
Validation loss = 0.015429740771651268
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015269114635884762
Validation loss = 0.015412062406539917
Validation loss = 0.014944671653211117
Validation loss = 0.015424486249685287
Validation loss = 0.01506728958338499
Validation loss = 0.015209547244012356
Validation loss = 0.015128190629184246
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015275472775101662
Validation loss = 0.014880459755659103
Validation loss = 0.01518978737294674
Validation loss = 0.015399633906781673
Validation loss = 0.015153750777244568
Validation loss = 0.015123261138796806
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015534946694970131
Validation loss = 0.014836562797427177
Validation loss = 0.015540657564997673
Validation loss = 0.015307113528251648
Validation loss = 0.015075026080012321
Validation loss = 0.014998351223766804
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014721405692398548
Validation loss = 0.01516301091760397
Validation loss = 0.015524910762906075
Validation loss = 0.014931781217455864
Validation loss = 0.015138145536184311
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 105
average number of affinization = 77.04419889502762
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 135
average number of affinization = 77.36263736263736
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 139
average number of affinization = 77.69945355191257
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 93
average number of affinization = 77.78260869565217
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 114
average number of affinization = 77.97837837837838
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 137
average number of affinization = 78.29569892473118
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 316      |
| Iteration     | 29       |
| MaximumReturn | 320      |
| MinimumReturn | 312      |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01504491362720728
Validation loss = 0.014971915632486343
Validation loss = 0.015203858725726604
Validation loss = 0.01520600263029337
Validation loss = 0.015140967443585396
Validation loss = 0.015148628503084183
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015229967422783375
Validation loss = 0.015101506374776363
Validation loss = 0.015192761085927486
Validation loss = 0.015042252838611603
Validation loss = 0.015169208869338036
Validation loss = 0.01583591289818287
Validation loss = 0.01557270810008049
Validation loss = 0.015050863847136497
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015056359581649303
Validation loss = 0.015325665473937988
Validation loss = 0.015227098017930984
Validation loss = 0.0149846775457263
Validation loss = 0.01552235335111618
Validation loss = 0.015097632072865963
Validation loss = 0.01491630356758833
Validation loss = 0.01491954643279314
Validation loss = 0.015001271851360798
Validation loss = 0.015131243504583836
Validation loss = 0.015407213009893894
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015400666743516922
Validation loss = 0.015152026899158955
Validation loss = 0.015205275267362595
Validation loss = 0.015274511650204659
Validation loss = 0.015106662176549435
Validation loss = 0.014975513331592083
Validation loss = 0.015262448228895664
Validation loss = 0.01556794997304678
Validation loss = 0.015408463776111603
Validation loss = 0.015227423049509525
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014797976240515709
Validation loss = 0.015468121506273746
Validation loss = 0.01560976728796959
Validation loss = 0.0150178587064147
Validation loss = 0.014834255911409855
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 106
average number of affinization = 78.44385026737967
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 112
average number of affinization = 78.62234042553192
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 130
average number of affinization = 78.8941798941799
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 113
average number of affinization = 79.07368421052631
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 131
average number of affinization = 79.3455497382199
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 118
average number of affinization = 79.546875
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 314      |
| Iteration     | 30       |
| MaximumReturn | 318      |
| MinimumReturn | 310      |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015186449512839317
Validation loss = 0.014997337013483047
Validation loss = 0.015607049688696861
Validation loss = 0.015711836516857147
Validation loss = 0.014985330402851105
Validation loss = 0.01502994541078806
Validation loss = 0.015575544908642769
Validation loss = 0.015536341816186905
Validation loss = 0.015544760972261429
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015041747130453587
Validation loss = 0.015110496431589127
Validation loss = 0.015534868463873863
Validation loss = 0.014914500527083874
Validation loss = 0.015249033458530903
Validation loss = 0.015526162460446358
Validation loss = 0.015183052979409695
Validation loss = 0.01502316351979971
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014979210682213306
Validation loss = 0.015347728505730629
Validation loss = 0.015646416693925858
Validation loss = 0.015170929953455925
Validation loss = 0.015114331617951393
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015426067635416985
Validation loss = 0.015120155178010464
Validation loss = 0.014878172427415848
Validation loss = 0.015549750998616219
Validation loss = 0.015428096987307072
Validation loss = 0.015387488529086113
Validation loss = 0.016102220863103867
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015301595441997051
Validation loss = 0.014928552322089672
Validation loss = 0.014939723536372185
Validation loss = 0.015088383108377457
Validation loss = 0.015359068289399147
Validation loss = 0.015284104272723198
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 132
average number of affinization = 79.81865284974093
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 148
average number of affinization = 80.1701030927835
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 131
average number of affinization = 80.43076923076923
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 136
average number of affinization = 80.71428571428571
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 126
average number of affinization = 80.94416243654823
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 141
average number of affinization = 81.24747474747475
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 314      |
| Iteration     | 31       |
| MaximumReturn | 318      |
| MinimumReturn | 310      |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014981979504227638
Validation loss = 0.014987871050834656
Validation loss = 0.015026824548840523
Validation loss = 0.014980005100369453
Validation loss = 0.015131001360714436
Validation loss = 0.014969008043408394
Validation loss = 0.014971444383263588
Validation loss = 0.015166932716965675
Validation loss = 0.014930660836398602
Validation loss = 0.015549571253359318
Validation loss = 0.015025860629975796
Validation loss = 0.01530851423740387
Validation loss = 0.015108219347894192
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015173787251114845
Validation loss = 0.015441058203577995
Validation loss = 0.015161862596869469
Validation loss = 0.014979487285017967
Validation loss = 0.015082119032740593
Validation loss = 0.01563461869955063
Validation loss = 0.014981877990067005
Validation loss = 0.015123856253921986
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015538739040493965
Validation loss = 0.01526494137942791
Validation loss = 0.015397579409182072
Validation loss = 0.01518054585903883
Validation loss = 0.015154280699789524
Validation loss = 0.015398683026432991
Validation loss = 0.015415648929774761
Validation loss = 0.015057099051773548
Validation loss = 0.015504658222198486
Validation loss = 0.015717986971139908
Validation loss = 0.015388021245598793
Validation loss = 0.015195563435554504
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015369467437267303
Validation loss = 0.015182641334831715
Validation loss = 0.015273043885827065
Validation loss = 0.015155920758843422
Validation loss = 0.015543128363788128
Validation loss = 0.015411628410220146
Validation loss = 0.015552865341305733
Validation loss = 0.015130287036299706
Validation loss = 0.015429198741912842
Validation loss = 0.015229098498821259
Validation loss = 0.015269474126398563
Validation loss = 0.015303404070436954
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015088376589119434
Validation loss = 0.015193917788565159
Validation loss = 0.015405812300741673
Validation loss = 0.015107613056898117
Validation loss = 0.015501618385314941
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 133
average number of affinization = 81.50753768844221
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 129
average number of affinization = 81.745
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 123
average number of affinization = 81.95024875621891
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 133
average number of affinization = 82.20297029702971
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 133
average number of affinization = 82.45320197044335
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 127
average number of affinization = 82.67156862745098
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 322      |
| Iteration     | 32       |
| MaximumReturn | 325      |
| MinimumReturn | 318      |
| TotalSamples  | 136000   |
----------------------------
