Logging to experiments/half_cheetah/oct29/w350e3_seed3421
Print configuration .....
{'env_name': 'half_cheetah', 'random_seeds': [4321, 2314, 2341, 3421], 'save_variables': False, 'model_save_dir': '/tmp/half_cheetah_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'intrinsic_reward_only': False, 'external_reward_evaluation_interval': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [32, 32], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'trpo_ext_reward': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 0
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 0
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6137779355049133
Validation loss = 0.14529868960380554
Validation loss = 0.09558163583278656
Validation loss = 0.08188080787658691
Validation loss = 0.0737491101026535
Validation loss = 0.07520605623722076
Validation loss = 0.07384832203388214
Validation loss = 0.06516578793525696
Validation loss = 0.06366938352584839
Validation loss = 0.0741058886051178
Validation loss = 0.07023419439792633
Validation loss = 0.07060940563678741
Validation loss = 0.06149135157465935
Validation loss = 0.06376864016056061
Validation loss = 0.06031527370214462
Validation loss = 0.06079650670289993
Validation loss = 0.06075434386730194
Validation loss = 0.06002427637577057
Validation loss = 0.07596205174922943
Validation loss = 0.05850956588983536
Validation loss = 0.05865629017353058
Validation loss = 0.05869406834244728
Validation loss = 0.0687367394566536
Validation loss = 0.05814284086227417
Validation loss = 0.06103847175836563
Validation loss = 0.05952570587396622
Validation loss = 0.06253433227539062
Validation loss = 0.055943019688129425
Validation loss = 0.05640629678964615
Validation loss = 0.05483449995517731
Validation loss = 0.05764203518629074
Validation loss = 0.055429812520742416
Validation loss = 0.05600576847791672
Validation loss = 0.0605156272649765
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6422182321548462
Validation loss = 0.14608831703662872
Validation loss = 0.09729986637830734
Validation loss = 0.08284987509250641
Validation loss = 0.08329892158508301
Validation loss = 0.07247604429721832
Validation loss = 0.06749026477336884
Validation loss = 0.06528612971305847
Validation loss = 0.06890447437763214
Validation loss = 0.06916989386081696
Validation loss = 0.0636378675699234
Validation loss = 0.0656590461730957
Validation loss = 0.06345520168542862
Validation loss = 0.06517119705677032
Validation loss = 0.08891920745372772
Validation loss = 0.0603853240609169
Validation loss = 0.06046910583972931
Validation loss = 0.06033557280898094
Validation loss = 0.05896428972482681
Validation loss = 0.07565327733755112
Validation loss = 0.05989454314112663
Validation loss = 0.0688556581735611
Validation loss = 0.057471126317977905
Validation loss = 0.061358384788036346
Validation loss = 0.058548420667648315
Validation loss = 0.05583854019641876
Validation loss = 0.059818338602781296
Validation loss = 0.05518462136387825
Validation loss = 0.05628344416618347
Validation loss = 0.05455552041530609
Validation loss = 0.06277266144752502
Validation loss = 0.05485276132822037
Validation loss = 0.06128460913896561
Validation loss = 0.05501723289489746
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.37954723834991455
Validation loss = 0.16198831796646118
Validation loss = 0.10705077648162842
Validation loss = 0.08883774280548096
Validation loss = 0.07823672145605087
Validation loss = 0.07844429463148117
Validation loss = 0.07274431735277176
Validation loss = 0.06416449695825577
Validation loss = 0.07162056118249893
Validation loss = 0.06853051483631134
Validation loss = 0.062122419476509094
Validation loss = 0.06491798162460327
Validation loss = 0.06007975712418556
Validation loss = 0.06552589684724808
Validation loss = 0.06347927451133728
Validation loss = 0.062343448400497437
Validation loss = 0.06565290689468384
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3769746422767639
Validation loss = 0.14699020981788635
Validation loss = 0.10017921775579453
Validation loss = 0.08290281891822815
Validation loss = 0.09114909172058105
Validation loss = 0.07418760657310486
Validation loss = 0.06772682070732117
Validation loss = 0.07136210799217224
Validation loss = 0.06493330001831055
Validation loss = 0.06314697116613388
Validation loss = 0.061824675649404526
Validation loss = 0.06039447337388992
Validation loss = 0.06533370912075043
Validation loss = 0.06382707506418228
Validation loss = 0.059496860951185226
Validation loss = 0.06602464616298676
Validation loss = 0.06023941934108734
Validation loss = 0.059296540915966034
Validation loss = 0.058031488209962845
Validation loss = 0.07767435908317566
Validation loss = 0.06021357327699661
Validation loss = 0.05735299736261368
Validation loss = 0.05887523293495178
Validation loss = 0.06461767852306366
Validation loss = 0.05723568797111511
Validation loss = 0.06035799905657768
Validation loss = 0.05658825859427452
Validation loss = 0.05878240615129471
Validation loss = 0.056601643562316895
Validation loss = 0.057044148445129395
Validation loss = 0.059310127049684525
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6026105880737305
Validation loss = 0.14134758710861206
Validation loss = 0.09405016899108887
Validation loss = 0.07962384819984436
Validation loss = 0.07370045781135559
Validation loss = 0.10041798651218414
Validation loss = 0.0676022619009018
Validation loss = 0.06696107983589172
Validation loss = 0.06106878072023392
Validation loss = 0.06511647254228592
Validation loss = 0.07256082445383072
Validation loss = 0.06489230692386627
Validation loss = 0.0592317134141922
Validation loss = 0.06401080638170242
Validation loss = 0.07265771180391312
Validation loss = 0.058710381388664246
Validation loss = 0.06539052724838257
Validation loss = 0.059212058782577515
Validation loss = 0.07051651179790497
Validation loss = 0.05924955755472183
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 260
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 215
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 249
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 239
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 218
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 223
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -298     |
| Iteration     | 0        |
| MaximumReturn | -214     |
| MinimumReturn | -399     |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08833881467580795
Validation loss = 0.06840694695711136
Validation loss = 0.06250372529029846
Validation loss = 0.06037873402237892
Validation loss = 0.05823422595858574
Validation loss = 0.057291679084300995
Validation loss = 0.060798779129981995
Validation loss = 0.05883722007274628
Validation loss = 0.06932957470417023
Validation loss = 0.055205993354320526
Validation loss = 0.06080867350101471
Validation loss = 0.0586932972073555
Validation loss = 0.05517058074474335
Validation loss = 0.05875251442193985
Validation loss = 0.05700794607400894
Validation loss = 0.055671755224466324
Validation loss = 0.05451379716396332
Validation loss = 0.05417171120643616
Validation loss = 0.05391036719083786
Validation loss = 0.0552583783864975
Validation loss = 0.060241058468818665
Validation loss = 0.0596015490591526
Validation loss = 0.054519087076187134
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08131889998912811
Validation loss = 0.06461885571479797
Validation loss = 0.06934107840061188
Validation loss = 0.06066696345806122
Validation loss = 0.06006310135126114
Validation loss = 0.060897864401340485
Validation loss = 0.06221605837345123
Validation loss = 0.05803324282169342
Validation loss = 0.06689411401748657
Validation loss = 0.06053134426474571
Validation loss = 0.05544400215148926
Validation loss = 0.07678572833538055
Validation loss = 0.056102052330970764
Validation loss = 0.05482327193021774
Validation loss = 0.055728454142808914
Validation loss = 0.054978661239147186
Validation loss = 0.06035102158784866
Validation loss = 0.05366075783967972
Validation loss = 0.05436126887798309
Validation loss = 0.054587725549936295
Validation loss = 0.054325222969055176
Validation loss = 0.0553891696035862
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08989280462265015
Validation loss = 0.06851678341627121
Validation loss = 0.06473173946142197
Validation loss = 0.06514500081539154
Validation loss = 0.0622495636343956
Validation loss = 0.062469206750392914
Validation loss = 0.06097288802266121
Validation loss = 0.0602446049451828
Validation loss = 0.05882806330919266
Validation loss = 0.059553634375333786
Validation loss = 0.06014213711023331
Validation loss = 0.05851331353187561
Validation loss = 0.05783285200595856
Validation loss = 0.05948226898908615
Validation loss = 0.06593073159456253
Validation loss = 0.05732884258031845
Validation loss = 0.05524752289056778
Validation loss = 0.05918540060520172
Validation loss = 0.05604834854602814
Validation loss = 0.05745489150285721
Validation loss = 0.05510685592889786
Validation loss = 0.057290807366371155
Validation loss = 0.05562774837017059
Validation loss = 0.057058870792388916
Validation loss = 0.055031925439834595
Validation loss = 0.05753616988658905
Validation loss = 0.05441455543041229
Validation loss = 0.05384798347949982
Validation loss = 0.054303281009197235
Validation loss = 0.057960595935583115
Validation loss = 0.05504440516233444
Validation loss = 0.053577274084091187
Validation loss = 0.05398958921432495
Validation loss = 0.05357200652360916
Validation loss = 0.05489451438188553
Validation loss = 0.05774008482694626
Validation loss = 0.054636985063552856
Validation loss = 0.05383139103651047
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08904905617237091
Validation loss = 0.06400393694639206
Validation loss = 0.061301521956920624
Validation loss = 0.06313927471637726
Validation loss = 0.06161712110042572
Validation loss = 0.060653991997241974
Validation loss = 0.0575888566672802
Validation loss = 0.058172207325696945
Validation loss = 0.05822521448135376
Validation loss = 0.05657191574573517
Validation loss = 0.05583237484097481
Validation loss = 0.056937042623758316
Validation loss = 0.05885042995214462
Validation loss = 0.056300267577171326
Validation loss = 0.05506581813097
Validation loss = 0.05707235634326935
Validation loss = 0.05737835913896561
Validation loss = 0.05635392665863037
Validation loss = 0.05495680868625641
Validation loss = 0.05607262998819351
Validation loss = 0.055555351078510284
Validation loss = 0.05533889681100845
Validation loss = 0.05454018712043762
Validation loss = 0.056232474744319916
Validation loss = 0.05641123652458191
Validation loss = 0.0582682266831398
Validation loss = 0.059631332755088806
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08384481072425842
Validation loss = 0.06538958847522736
Validation loss = 0.06316505372524261
Validation loss = 0.06163347512483597
Validation loss = 0.06251423060894012
Validation loss = 0.06305645406246185
Validation loss = 0.064641073346138
Validation loss = 0.06134462356567383
Validation loss = 0.06094234809279442
Validation loss = 0.059090323746204376
Validation loss = 0.06934332102537155
Validation loss = 0.05896671116352081
Validation loss = 0.06921708583831787
Validation loss = 0.05894041806459427
Validation loss = 0.05651399493217468
Validation loss = 0.07923441380262375
Validation loss = 0.06652985513210297
Validation loss = 0.05909284949302673
Validation loss = 0.05786949396133423
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 365
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 363
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 185
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 402
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 372
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 392
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -270     |
| Iteration     | 1        |
| MaximumReturn | -111     |
| MinimumReturn | -352     |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09390300512313843
Validation loss = 0.06509841978549957
Validation loss = 0.0625757947564125
Validation loss = 0.06164415553212166
Validation loss = 0.06213578209280968
Validation loss = 0.06350747495889664
Validation loss = 0.06264155358076096
Validation loss = 0.059602562338113785
Validation loss = 0.062408775091171265
Validation loss = 0.060062870383262634
Validation loss = 0.06092602014541626
Validation loss = 0.06006364896893501
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10356283187866211
Validation loss = 0.06321380287408829
Validation loss = 0.06407848745584488
Validation loss = 0.06115444377064705
Validation loss = 0.0657622441649437
Validation loss = 0.062478240579366684
Validation loss = 0.06145856902003288
Validation loss = 0.05915963277220726
Validation loss = 0.06148943305015564
Validation loss = 0.05953061953186989
Validation loss = 0.05806918814778328
Validation loss = 0.05989359691739082
Validation loss = 0.0616166777908802
Validation loss = 0.0635579526424408
Validation loss = 0.061392661184072495
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0937674269080162
Validation loss = 0.06232702359557152
Validation loss = 0.06184704601764679
Validation loss = 0.05907125771045685
Validation loss = 0.06188591942191124
Validation loss = 0.05822661146521568
Validation loss = 0.058597762137651443
Validation loss = 0.057185277342796326
Validation loss = 0.06572764366865158
Validation loss = 0.05914755165576935
Validation loss = 0.05779505893588066
Validation loss = 0.059831395745277405
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09148990362882614
Validation loss = 0.06407501548528671
Validation loss = 0.06883297115564346
Validation loss = 0.061294447630643845
Validation loss = 0.0695437416434288
Validation loss = 0.06384509801864624
Validation loss = 0.062372755259275436
Validation loss = 0.061461109668016434
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.09589757770299911
Validation loss = 0.06352691352367401
Validation loss = 0.06358533352613449
Validation loss = 0.06362317502498627
Validation loss = 0.06058799847960472
Validation loss = 0.06353369355201721
Validation loss = 0.06160353496670723
Validation loss = 0.06480994820594788
Validation loss = 0.06148630753159523
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 417
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 441
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 453
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 447
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 391
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 437
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -101     |
| Iteration     | 2        |
| MaximumReturn | 228      |
| MinimumReturn | -272     |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07972802966833115
Validation loss = 0.06673429906368256
Validation loss = 0.07450873404741287
Validation loss = 0.0673738420009613
Validation loss = 0.06928157806396484
Validation loss = 0.07180154323577881
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0740441083908081
Validation loss = 0.06596791744232178
Validation loss = 0.06456775963306427
Validation loss = 0.06492112576961517
Validation loss = 0.06969009339809418
Validation loss = 0.06524315476417542
Validation loss = 0.06524930894374847
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07216307520866394
Validation loss = 0.065029576420784
Validation loss = 0.06581990420818329
Validation loss = 0.06650197505950928
Validation loss = 0.06769448518753052
Validation loss = 0.06487303972244263
Validation loss = 0.06567635387182236
Validation loss = 0.06374648213386536
Validation loss = 0.06419515609741211
Validation loss = 0.06742370128631592
Validation loss = 0.06471391767263412
Validation loss = 0.06507387012243271
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07620960474014282
Validation loss = 0.06636747717857361
Validation loss = 0.06966159492731094
Validation loss = 0.06662727147340775
Validation loss = 0.06619636714458466
Validation loss = 0.06604087352752686
Validation loss = 0.06498932838439941
Validation loss = 0.06571179628372192
Validation loss = 0.06753203272819519
Validation loss = 0.06582166999578476
Validation loss = 0.06620127707719803
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07523840665817261
Validation loss = 0.0685235857963562
Validation loss = 0.06673862040042877
Validation loss = 0.06598759442567825
Validation loss = 0.06940948963165283
Validation loss = 0.06687411665916443
Validation loss = 0.06708786636590958
Validation loss = 0.06756967306137085
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 530
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 552
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 546
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 534
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 532
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 358
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 493      |
| Iteration     | 3        |
| MaximumReturn | 1e+03    |
| MinimumReturn | 75.4     |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06958848237991333
Validation loss = 0.06838051229715347
Validation loss = 0.06690796464681625
Validation loss = 0.06565938144922256
Validation loss = 0.06460674107074738
Validation loss = 0.0665191188454628
Validation loss = 0.0664040669798851
Validation loss = 0.06362491846084595
Validation loss = 0.06462915986776352
Validation loss = 0.06681257486343384
Validation loss = 0.06333182752132416
Validation loss = 0.06762169301509857
Validation loss = 0.0642332062125206
Validation loss = 0.06497487425804138
Validation loss = 0.06418558955192566
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06767421215772629
Validation loss = 0.06423051655292511
Validation loss = 0.06434822082519531
Validation loss = 0.06611520797014236
Validation loss = 0.06482170522212982
Validation loss = 0.0645563155412674
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07028760015964508
Validation loss = 0.06327341496944427
Validation loss = 0.06463958323001862
Validation loss = 0.06361672282218933
Validation loss = 0.0619363896548748
Validation loss = 0.0637650266289711
Validation loss = 0.0640915185213089
Validation loss = 0.06338118016719818
Validation loss = 0.06255676597356796
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06917766481637955
Validation loss = 0.06518591940402985
Validation loss = 0.06521615386009216
Validation loss = 0.06353860348463058
Validation loss = 0.06609942764043808
Validation loss = 0.06359522044658661
Validation loss = 0.06262288242578506
Validation loss = 0.06333132833242416
Validation loss = 0.0705370381474495
Validation loss = 0.06386168301105499
Validation loss = 0.06302326917648315
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06865868717432022
Validation loss = 0.06737001240253448
Validation loss = 0.06613057106733322
Validation loss = 0.06623369455337524
Validation loss = 0.06727778166532516
Validation loss = 0.06535360217094421
Validation loss = 0.06914404779672623
Validation loss = 0.06366104632616043
Validation loss = 0.06316745281219482
Validation loss = 0.06304209679365158
Validation loss = 0.06871923059225082
Validation loss = 0.06467610597610474
Validation loss = 0.06451727449893951
Validation loss = 0.06289069354534149
Validation loss = 0.06416342407464981
Validation loss = 0.06507058441638947
Validation loss = 0.0645180195569992
Validation loss = 0.06514759361743927
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 575
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 550
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 607
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 578
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 608
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 575
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 76.8     |
| Iteration     | 4        |
| MaximumReturn | 280      |
| MinimumReturn | -53.2    |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06432787328958511
Validation loss = 0.060492828488349915
Validation loss = 0.059047818183898926
Validation loss = 0.059593066573143005
Validation loss = 0.058921683579683304
Validation loss = 0.05733053386211395
Validation loss = 0.057691704481840134
Validation loss = 0.058661576360464096
Validation loss = 0.05697951838374138
Validation loss = 0.056500960141420364
Validation loss = 0.05834944173693657
Validation loss = 0.05998525023460388
Validation loss = 0.057003140449523926
Validation loss = 0.059310853481292725
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06192963197827339
Validation loss = 0.061843108385801315
Validation loss = 0.05978737398982048
Validation loss = 0.059501126408576965
Validation loss = 0.0620434544980526
Validation loss = 0.06022531911730766
Validation loss = 0.060997817665338516
Validation loss = 0.05948817357420921
Validation loss = 0.05881254002451897
Validation loss = 0.06082497537136078
Validation loss = 0.05732190981507301
Validation loss = 0.05771985277533531
Validation loss = 0.05763304606080055
Validation loss = 0.06024874374270439
Validation loss = 0.06091896817088127
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06329601258039474
Validation loss = 0.05776150897145271
Validation loss = 0.05720803514122963
Validation loss = 0.0634397566318512
Validation loss = 0.05751223489642143
Validation loss = 0.05928864702582359
Validation loss = 0.0562572181224823
Validation loss = 0.0570782870054245
Validation loss = 0.05908703804016113
Validation loss = 0.05693526566028595
Validation loss = 0.05595825985074043
Validation loss = 0.06055011972784996
Validation loss = 0.058601755648851395
Validation loss = 0.06003822758793831
Validation loss = 0.05704553797841072
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06516949087381363
Validation loss = 0.06128605082631111
Validation loss = 0.06013919413089752
Validation loss = 0.0580226331949234
Validation loss = 0.06509850174188614
Validation loss = 0.05805765092372894
Validation loss = 0.06125299260020256
Validation loss = 0.06370794028043747
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06495752930641174
Validation loss = 0.059953004121780396
Validation loss = 0.0602402538061142
Validation loss = 0.05848630890250206
Validation loss = 0.060468319803476334
Validation loss = 0.062038373202085495
Validation loss = 0.06063840165734291
Validation loss = 0.058099474757909775
Validation loss = 0.05677482485771179
Validation loss = 0.058211904019117355
Validation loss = 0.057964060455560684
Validation loss = 0.06159186363220215
Validation loss = 0.056662991642951965
Validation loss = 0.06079019233584404
Validation loss = 0.059402596205472946
Validation loss = 0.05672856792807579
Validation loss = 0.05694672465324402
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 641
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 597
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 589
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 655
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 614
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 641
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 276      |
| Iteration     | 5        |
| MaximumReturn | 989      |
| MinimumReturn | -382     |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.058366380631923676
Validation loss = 0.05225769802927971
Validation loss = 0.05310432240366936
Validation loss = 0.05356857180595398
Validation loss = 0.05127573013305664
Validation loss = 0.052807968109846115
Validation loss = 0.053127262741327286
Validation loss = 0.051369477063417435
Validation loss = 0.05380120873451233
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06184738874435425
Validation loss = 0.053517092019319534
Validation loss = 0.0559285506606102
Validation loss = 0.05370527133345604
Validation loss = 0.05628996342420578
Validation loss = 0.05438722297549248
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05876907706260681
Validation loss = 0.05413133651018143
Validation loss = 0.05303746089339256
Validation loss = 0.051937513053417206
Validation loss = 0.05393020436167717
Validation loss = 0.05175630375742912
Validation loss = 0.05393756553530693
Validation loss = 0.05138363316655159
Validation loss = 0.05247341841459274
Validation loss = 0.05332320183515549
Validation loss = 0.05027283355593681
Validation loss = 0.05402800440788269
Validation loss = 0.05012712627649307
Validation loss = 0.05121086910367012
Validation loss = 0.050301264971494675
Validation loss = 0.051301587373018265
Validation loss = 0.05009031668305397
Validation loss = 0.054143328219652176
Validation loss = 0.04954330995678902
Validation loss = 0.050173480063676834
Validation loss = 0.0503653809428215
Validation loss = 0.0522540919482708
Validation loss = 0.051482658833265305
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06156206876039505
Validation loss = 0.05318818241357803
Validation loss = 0.05268191546201706
Validation loss = 0.0547451376914978
Validation loss = 0.0533415712416172
Validation loss = 0.052669744938611984
Validation loss = 0.054075296968221664
Validation loss = 0.05232575908303261
Validation loss = 0.052832454442977905
Validation loss = 0.05415106192231178
Validation loss = 0.05557152256369591
Validation loss = 0.057755231857299805
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.058299388736486435
Validation loss = 0.053987737745046616
Validation loss = 0.052497465163469315
Validation loss = 0.05216256156563759
Validation loss = 0.052600957453250885
Validation loss = 0.05289459973573685
Validation loss = 0.05165984481573105
Validation loss = 0.05270653963088989
Validation loss = 0.05036896467208862
Validation loss = 0.05201639607548714
Validation loss = 0.050958577543497086
Validation loss = 0.05022954195737839
Validation loss = 0.05082301050424576
Validation loss = 0.05377425625920296
Validation loss = 0.05115555599331856
Validation loss = 0.052023570984601974
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 682
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 640
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 670
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 667
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 658
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 620
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 544      |
| Iteration     | 6        |
| MaximumReturn | 2.01e+03 |
| MinimumReturn | -184     |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.055725615471601486
Validation loss = 0.052862197160720825
Validation loss = 0.05272279679775238
Validation loss = 0.05225758254528046
Validation loss = 0.05279236286878586
Validation loss = 0.05078534036874771
Validation loss = 0.05208425968885422
Validation loss = 0.05219387263059616
Validation loss = 0.05263710021972656
Validation loss = 0.049868952482938766
Validation loss = 0.04961181432008743
Validation loss = 0.05178876221179962
Validation loss = 0.050882987678050995
Validation loss = 0.0537840910255909
Validation loss = 0.04968170076608658
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05954579636454582
Validation loss = 0.05617508292198181
Validation loss = 0.05429764837026596
Validation loss = 0.056057922542095184
Validation loss = 0.05487513169646263
Validation loss = 0.05330229178071022
Validation loss = 0.053582385182380676
Validation loss = 0.05129074677824974
Validation loss = 0.055258773267269135
Validation loss = 0.05575159192085266
Validation loss = 0.05287470668554306
Validation loss = 0.052845247089862823
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05673518031835556
Validation loss = 0.052064359188079834
Validation loss = 0.052198681980371475
Validation loss = 0.05284043401479721
Validation loss = 0.05404948815703392
Validation loss = 0.05205594003200531
Validation loss = 0.05163149535655975
Validation loss = 0.052228547632694244
Validation loss = 0.05097027122974396
Validation loss = 0.05138503760099411
Validation loss = 0.05307256430387497
Validation loss = 0.05123261734843254
Validation loss = 0.051881417632102966
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.058924414217472076
Validation loss = 0.05426605045795441
Validation loss = 0.05512697249650955
Validation loss = 0.05260651558637619
Validation loss = 0.057145580649375916
Validation loss = 0.05284187197685242
Validation loss = 0.052697692066431046
Validation loss = 0.05230569839477539
Validation loss = 0.052638210356235504
Validation loss = 0.053418513387441635
Validation loss = 0.053662363439798355
Validation loss = 0.05237087979912758
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05939202755689621
Validation loss = 0.05120497941970825
Validation loss = 0.05157265067100525
Validation loss = 0.05142092704772949
Validation loss = 0.05000597983598709
Validation loss = 0.051231808960437775
Validation loss = 0.051460012793540955
Validation loss = 0.05276550352573395
Validation loss = 0.050900109112262726
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 733
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 670
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 712
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 703
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 700
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 688
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 200      |
| Iteration     | 7        |
| MaximumReturn | 828      |
| MinimumReturn | -367     |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05291677638888359
Validation loss = 0.04579288139939308
Validation loss = 0.04387183487415314
Validation loss = 0.04434511438012123
Validation loss = 0.044001009315252304
Validation loss = 0.04387912154197693
Validation loss = 0.042066846042871475
Validation loss = 0.04755588248372078
Validation loss = 0.04349726065993309
Validation loss = 0.04513847082853317
Validation loss = 0.04304119199514389
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0527457632124424
Validation loss = 0.04790623113512993
Validation loss = 0.04682888090610504
Validation loss = 0.047643594443798065
Validation loss = 0.04600978270173073
Validation loss = 0.048000793904066086
Validation loss = 0.04591469466686249
Validation loss = 0.04712311923503876
Validation loss = 0.04621468484401703
Validation loss = 0.04633795842528343
Validation loss = 0.04417864978313446
Validation loss = 0.04503876343369484
Validation loss = 0.04602895677089691
Validation loss = 0.043779850006103516
Validation loss = 0.04377802461385727
Validation loss = 0.04516643285751343
Validation loss = 0.0436440147459507
Validation loss = 0.04569840058684349
Validation loss = 0.04339732229709625
Validation loss = 0.04549012705683708
Validation loss = 0.043676648288965225
Validation loss = 0.045946743339300156
Validation loss = 0.04307975247502327
Validation loss = 0.042944855988025665
Validation loss = 0.04502132534980774
Validation loss = 0.04280270263552666
Validation loss = 0.044644031673669815
Validation loss = 0.041972000151872635
Validation loss = 0.04352779686450958
Validation loss = 0.041991256177425385
Validation loss = 0.04335778206586838
Validation loss = 0.04180052876472473
Validation loss = 0.0454547218978405
Validation loss = 0.0410432368516922
Validation loss = 0.04063138738274574
Validation loss = 0.042219385504722595
Validation loss = 0.040755610913038254
Validation loss = 0.041351985186338425
Validation loss = 0.041055575013160706
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05234157666563988
Validation loss = 0.044483527541160583
Validation loss = 0.04334374517202377
Validation loss = 0.04512626305222511
Validation loss = 0.044028379023075104
Validation loss = 0.04317536577582359
Validation loss = 0.05132618173956871
Validation loss = 0.04217034950852394
Validation loss = 0.04263150319457054
Validation loss = 0.045844174921512604
Validation loss = 0.04614020884037018
Validation loss = 0.04242870584130287
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05405419319868088
Validation loss = 0.04859089478850365
Validation loss = 0.04765688255429268
Validation loss = 0.04734739288687706
Validation loss = 0.04651588201522827
Validation loss = 0.04827520251274109
Validation loss = 0.04667393118143082
Validation loss = 0.04872577637434006
Validation loss = 0.04521091282367706
Validation loss = 0.04710165411233902
Validation loss = 0.0474427305161953
Validation loss = 0.04628640413284302
Validation loss = 0.044359758496284485
Validation loss = 0.04594899341464043
Validation loss = 0.04519873112440109
Validation loss = 0.04604510962963104
Validation loss = 0.04746478796005249
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.052238304167985916
Validation loss = 0.044880758970975876
Validation loss = 0.04778648912906647
Validation loss = 0.045714519917964935
Validation loss = 0.045243941247463226
Validation loss = 0.044597964733839035
Validation loss = 0.04708554223179817
Validation loss = 0.04610087350010872
Validation loss = 0.04471477121114731
Validation loss = 0.045205119997262955
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 735
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 709
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 743
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 745
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 757
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 701
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 384      |
| Iteration     | 8        |
| MaximumReturn | 1.89e+03 |
| MinimumReturn | -526     |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04852844029664993
Validation loss = 0.04228205233812332
Validation loss = 0.039838775992393494
Validation loss = 0.039376236498355865
Validation loss = 0.039374180138111115
Validation loss = 0.03984326869249344
Validation loss = 0.038306038826704025
Validation loss = 0.037968337535858154
Validation loss = 0.03854704648256302
Validation loss = 0.03852277249097824
Validation loss = 0.04112302139401436
Validation loss = 0.03694256395101547
Validation loss = 0.03801768273115158
Validation loss = 0.036810796707868576
Validation loss = 0.037930432707071304
Validation loss = 0.0384870320558548
Validation loss = 0.044061265885829926
Validation loss = 0.03579552471637726
Validation loss = 0.03708659112453461
Validation loss = 0.03666790947318077
Validation loss = 0.03594418615102768
Validation loss = 0.03866558521986008
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0485515296459198
Validation loss = 0.03830486536026001
Validation loss = 0.037638742476701736
Validation loss = 0.03751801699399948
Validation loss = 0.03707193583250046
Validation loss = 0.03678596019744873
Validation loss = 0.037221841514110565
Validation loss = 0.03648248314857483
Validation loss = 0.037582118064165115
Validation loss = 0.0371323898434639
Validation loss = 0.03493574634194374
Validation loss = 0.03860024735331535
Validation loss = 0.03616938367486
Validation loss = 0.036540258675813675
Validation loss = 0.036079373210668564
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.049422405660152435
Validation loss = 0.04089266061782837
Validation loss = 0.039350491017103195
Validation loss = 0.03874072805047035
Validation loss = 0.038114987313747406
Validation loss = 0.038849301636219025
Validation loss = 0.03770998865365982
Validation loss = 0.041166964918375015
Validation loss = 0.037861086428165436
Validation loss = 0.038988955318927765
Validation loss = 0.0391489639878273
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.049842048436403275
Validation loss = 0.04079773277044296
Validation loss = 0.04011638090014458
Validation loss = 0.04068075865507126
Validation loss = 0.04118566960096359
Validation loss = 0.040760379284620285
Validation loss = 0.03987588360905647
Validation loss = 0.03906708210706711
Validation loss = 0.040415190160274506
Validation loss = 0.039754051715135574
Validation loss = 0.03964865580201149
Validation loss = 0.04021143168210983
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04743804410099983
Validation loss = 0.04028437286615372
Validation loss = 0.039582230150699615
Validation loss = 0.039958663284778595
Validation loss = 0.04164455831050873
Validation loss = 0.04060422629117966
Validation loss = 0.040027447044849396
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 748
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 768
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 751
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 775
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 747
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 734
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 265      |
| Iteration     | 9        |
| MaximumReturn | 1.73e+03 |
| MinimumReturn | -348     |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04372062906622887
Validation loss = 0.036094799637794495
Validation loss = 0.03562413901090622
Validation loss = 0.03564590588212013
Validation loss = 0.03620561584830284
Validation loss = 0.036007773131132126
Validation loss = 0.03403109312057495
Validation loss = 0.034730613231658936
Validation loss = 0.03684823587536812
Validation loss = 0.03349684178829193
Validation loss = 0.03492271900177002
Validation loss = 0.03380964696407318
Validation loss = 0.03491521254181862
Validation loss = 0.03419826924800873
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.040969714522361755
Validation loss = 0.03476623073220253
Validation loss = 0.03420557826757431
Validation loss = 0.03422636166214943
Validation loss = 0.03583672642707825
Validation loss = 0.033604975789785385
Validation loss = 0.034639254212379456
Validation loss = 0.03356513753533363
Validation loss = 0.034915562719106674
Validation loss = 0.03925049677491188
Validation loss = 0.031581275165081024
Validation loss = 0.033124133944511414
Validation loss = 0.03395441547036171
Validation loss = 0.03793906792998314
Validation loss = 0.03193647041916847
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04713878780603409
Validation loss = 0.036844540387392044
Validation loss = 0.03696742281317711
Validation loss = 0.03747054189443588
Validation loss = 0.037309251725673676
Validation loss = 0.03838634490966797
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.045868098735809326
Validation loss = 0.038011495023965836
Validation loss = 0.03714951500296593
Validation loss = 0.0373673290014267
Validation loss = 0.03805411234498024
Validation loss = 0.03735310584306717
Validation loss = 0.03783348202705383
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04600728675723076
Validation loss = 0.0393446609377861
Validation loss = 0.03992787003517151
Validation loss = 0.038772109895944595
Validation loss = 0.03882117569446564
Validation loss = 0.039506297558546066
Validation loss = 0.03697240352630615
Validation loss = 0.039795007556676865
Validation loss = 0.036944180727005005
Validation loss = 0.0389140248298645
Validation loss = 0.036690473556518555
Validation loss = 0.03625098615884781
Validation loss = 0.040134940296411514
Validation loss = 0.03557746112346649
Validation loss = 0.04002617299556732
Validation loss = 0.03680375590920448
Validation loss = 0.03690359741449356
Validation loss = 0.038291990756988525
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 767
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 738
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 758
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 774
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 766
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 765
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 301      |
| Iteration     | 10       |
| MaximumReturn | 1.13e+03 |
| MinimumReturn | -371     |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.042028915137052536
Validation loss = 0.03153689578175545
Validation loss = 0.031397104263305664
Validation loss = 0.03175057843327522
Validation loss = 0.031414058059453964
Validation loss = 0.0318310372531414
Validation loss = 0.03198825940489769
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.037352655082941055
Validation loss = 0.029578424990177155
Validation loss = 0.030389629304409027
Validation loss = 0.031421270221471786
Validation loss = 0.030773663893342018
Validation loss = 0.030642405152320862
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.043156642466783524
Validation loss = 0.03303685411810875
Validation loss = 0.033122364431619644
Validation loss = 0.0333145447075367
Validation loss = 0.034553006291389465
Validation loss = 0.032958243042230606
Validation loss = 0.03164839372038841
Validation loss = 0.03233667463064194
Validation loss = 0.03305094316601753
Validation loss = 0.031692828983068466
Validation loss = 0.03144140541553497
Validation loss = 0.033207181841135025
Validation loss = 0.03020208328962326
Validation loss = 0.03203515335917473
Validation loss = 0.029844949021935463
Validation loss = 0.031343936920166016
Validation loss = 0.031083762645721436
Validation loss = 0.02943672239780426
Validation loss = 0.03183482587337494
Validation loss = 0.030313590541481972
Validation loss = 0.028281867504119873
Validation loss = 0.029150383546948433
Validation loss = 0.0337277390062809
Validation loss = 0.027555113658308983
Validation loss = 0.02844393067061901
Validation loss = 0.028907159343361855
Validation loss = 0.027802152559161186
Validation loss = 0.029517116025090218
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04297086223959923
Validation loss = 0.03360092267394066
Validation loss = 0.03504428640007973
Validation loss = 0.03423599526286125
Validation loss = 0.035034846514463425
Validation loss = 0.03379267081618309
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04319010674953461
Validation loss = 0.03327612578868866
Validation loss = 0.03266143426299095
Validation loss = 0.033814769238233566
Validation loss = 0.0331186018884182
Validation loss = 0.03217284753918648
Validation loss = 0.03671221062541008
Validation loss = 0.03197595849633217
Validation loss = 0.03281615674495697
Validation loss = 0.032259780913591385
Validation loss = 0.03155370429158211
Validation loss = 0.03048926591873169
Validation loss = 0.03138627111911774
Validation loss = 0.03113378770649433
Validation loss = 0.030760711058974266
Validation loss = 0.030685165897011757
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 777
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 754
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 762
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 752
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 755
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 705
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 214      |
| Iteration     | 11       |
| MaximumReturn | 583      |
| MinimumReturn | -64.9    |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03841926157474518
Validation loss = 0.030728142708539963
Validation loss = 0.029606353491544724
Validation loss = 0.031141074374318123
Validation loss = 0.033822692930698395
Validation loss = 0.028997359797358513
Validation loss = 0.02984634041786194
Validation loss = 0.03267022594809532
Validation loss = 0.028503205627202988
Validation loss = 0.031584639102220535
Validation loss = 0.02812337689101696
Validation loss = 0.028921248391270638
Validation loss = 0.029732538387179375
Validation loss = 0.029746970161795616
Validation loss = 0.02753066085278988
Validation loss = 0.028403420001268387
Validation loss = 0.028431635349988937
Validation loss = 0.029381288215517998
Validation loss = 0.028757760301232338
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03937598690390587
Validation loss = 0.02903047390282154
Validation loss = 0.02927413582801819
Validation loss = 0.029104599729180336
Validation loss = 0.029618721455335617
Validation loss = 0.029529549181461334
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03619246929883957
Validation loss = 0.02772841975092888
Validation loss = 0.027875950559973717
Validation loss = 0.02800065092742443
Validation loss = 0.026724739000201225
Validation loss = 0.03077886626124382
Validation loss = 0.025803063064813614
Validation loss = 0.026401109993457794
Validation loss = 0.02679416909813881
Validation loss = 0.02772403135895729
Validation loss = 0.02597969025373459
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04446615278720856
Validation loss = 0.03221863508224487
Validation loss = 0.03316022455692291
Validation loss = 0.03167201951146126
Validation loss = 0.03302038460969925
Validation loss = 0.03212019428610802
Validation loss = 0.031397394835948944
Validation loss = 0.03544050455093384
Validation loss = 0.031652871519327164
Validation loss = 0.032277725636959076
Validation loss = 0.03158087655901909
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03953934088349342
Validation loss = 0.028833113610744476
Validation loss = 0.02865029312670231
Validation loss = 0.029900532215833664
Validation loss = 0.028973998501896858
Validation loss = 0.02865462563931942
Validation loss = 0.03305749222636223
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 736
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 754
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 728
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 760
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 725
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 744
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.46e+03 |
| Iteration     | 12       |
| MaximumReturn | 2.43e+03 |
| MinimumReturn | 648      |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.034592464566230774
Validation loss = 0.02629498578608036
Validation loss = 0.026748178526759148
Validation loss = 0.028131496161222458
Validation loss = 0.027458032593131065
Validation loss = 0.027762150391936302
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03245475888252258
Validation loss = 0.02638263814151287
Validation loss = 0.025941599160432816
Validation loss = 0.0269908644258976
Validation loss = 0.027176296338438988
Validation loss = 0.02905726619064808
Validation loss = 0.02537025697529316
Validation loss = 0.02561187744140625
Validation loss = 0.027059374377131462
Validation loss = 0.02522449381649494
Validation loss = 0.029395626857876778
Validation loss = 0.025044279173016548
Validation loss = 0.02436545118689537
Validation loss = 0.028540566563606262
Validation loss = 0.023966345936059952
Validation loss = 0.02563771791756153
Validation loss = 0.02459338679909706
Validation loss = 0.024345388635993004
Validation loss = 0.029307860881090164
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02887876331806183
Validation loss = 0.025208037346601486
Validation loss = 0.025970475748181343
Validation loss = 0.025307834148406982
Validation loss = 0.024674782529473305
Validation loss = 0.025794530287384987
Validation loss = 0.0261472649872303
Validation loss = 0.023378416895866394
Validation loss = 0.023930011317133904
Validation loss = 0.025139713659882545
Validation loss = 0.023464113473892212
Validation loss = 0.025174299255013466
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03578072786331177
Validation loss = 0.02809835411608219
Validation loss = 0.027747800573706627
Validation loss = 0.029451249167323112
Validation loss = 0.027825918048620224
Validation loss = 0.03039487637579441
Validation loss = 0.027538007125258446
Validation loss = 0.028235143050551414
Validation loss = 0.027518227696418762
Validation loss = 0.026595067232847214
Validation loss = 0.0273787509649992
Validation loss = 0.025855109095573425
Validation loss = 0.02655886672437191
Validation loss = 0.02841067500412464
Validation loss = 0.025562411174178123
Validation loss = 0.028343144804239273
Validation loss = 0.02613617479801178
Validation loss = 0.02601439878344536
Validation loss = 0.027390729635953903
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.032228849828243256
Validation loss = 0.026340726763010025
Validation loss = 0.02634589932858944
Validation loss = 0.02630169503390789
Validation loss = 0.02648797817528248
Validation loss = 0.025529293343424797
Validation loss = 0.026048865169286728
Validation loss = 0.02590416744351387
Validation loss = 0.026278942823410034
Validation loss = 0.024039587005972862
Validation loss = 0.02633783593773842
Validation loss = 0.02592523582279682
Validation loss = 0.024194765836000443
Validation loss = 0.024970179423689842
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 751
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 753
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 768
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 745
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 738
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 752
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 572      |
| Iteration     | 13       |
| MaximumReturn | 2.54e+03 |
| MinimumReturn | -461     |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.033923402428627014
Validation loss = 0.02562306448817253
Validation loss = 0.02516891248524189
Validation loss = 0.025954702869057655
Validation loss = 0.025257432833313942
Validation loss = 0.026232672855257988
Validation loss = 0.025171268731355667
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02977779321372509
Validation loss = 0.023346353322267532
Validation loss = 0.023493647575378418
Validation loss = 0.02593850903213024
Validation loss = 0.023272503167390823
Validation loss = 0.02366766519844532
Validation loss = 0.0237271785736084
Validation loss = 0.023070063441991806
Validation loss = 0.02344825118780136
Validation loss = 0.028415678068995476
Validation loss = 0.022257842123508453
Validation loss = 0.022993119433522224
Validation loss = 0.024022819474339485
Validation loss = 0.02162594348192215
Validation loss = 0.02544422261416912
Validation loss = 0.0215850081294775
Validation loss = 0.022350100800395012
Validation loss = 0.022930152714252472
Validation loss = 0.021182743832468987
Validation loss = 0.021798888221383095
Validation loss = 0.022435076534748077
Validation loss = 0.02240169420838356
Validation loss = 0.02181079238653183
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.031187858432531357
Validation loss = 0.022752344608306885
Validation loss = 0.022938568145036697
Validation loss = 0.02260945364832878
Validation loss = 0.02594909444451332
Validation loss = 0.022013496607542038
Validation loss = 0.023793630301952362
Validation loss = 0.0235245693475008
Validation loss = 0.02185911498963833
Validation loss = 0.023499753326177597
Validation loss = 0.022031068801879883
Validation loss = 0.02200118452310562
Validation loss = 0.02266821637749672
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03240879252552986
Validation loss = 0.02492530085146427
Validation loss = 0.025177408009767532
Validation loss = 0.026418061926960945
Validation loss = 0.02580670826137066
Validation loss = 0.024837689474225044
Validation loss = 0.02861468680202961
Validation loss = 0.024194134399294853
Validation loss = 0.02449874021112919
Validation loss = 0.024740470573306084
Validation loss = 0.024017581716179848
Validation loss = 0.024778001010417938
Validation loss = 0.023096075281500816
Validation loss = 0.025424698367714882
Validation loss = 0.023557627573609352
Validation loss = 0.02541166916489601
Validation loss = 0.022603780031204224
Validation loss = 0.02271433360874653
Validation loss = 0.02482469752430916
Validation loss = 0.02266322821378708
Validation loss = 0.02297130413353443
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02937249280512333
Validation loss = 0.023347461596131325
Validation loss = 0.02371724508702755
Validation loss = 0.02349662035703659
Validation loss = 0.024000078439712524
Validation loss = 0.02321046032011509
Validation loss = 0.02577717788517475
Validation loss = 0.02262469008564949
Validation loss = 0.02470451034605503
Validation loss = 0.02221277728676796
Validation loss = 0.021826904267072678
Validation loss = 0.023291924968361855
Validation loss = 0.02317500300705433
Validation loss = 0.022008009254932404
Validation loss = 0.024242499843239784
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 734
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 740
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 739
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 744
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 747
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 763
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 196      |
| Iteration     | 14       |
| MaximumReturn | 829      |
| MinimumReturn | -315     |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.029435452073812485
Validation loss = 0.023666242137551308
Validation loss = 0.02506156638264656
Validation loss = 0.025912057608366013
Validation loss = 0.023641439154744148
Validation loss = 0.02763703092932701
Validation loss = 0.02316966839134693
Validation loss = 0.023257330060005188
Validation loss = 0.02662813663482666
Validation loss = 0.023106474429368973
Validation loss = 0.02281562238931656
Validation loss = 0.023368660360574722
Validation loss = 0.02245941013097763
Validation loss = 0.02354581467807293
Validation loss = 0.022967452183365822
Validation loss = 0.02303379960358143
Validation loss = 0.025593582540750504
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.026942770928144455
Validation loss = 0.02113453857600689
Validation loss = 0.02048950269818306
Validation loss = 0.02115439623594284
Validation loss = 0.020780319347977638
Validation loss = 0.024224258959293365
Validation loss = 0.020757630467414856
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02691754698753357
Validation loss = 0.02124651148915291
Validation loss = 0.021706072613596916
Validation loss = 0.022274188697338104
Validation loss = 0.021610204130411148
Validation loss = 0.0218515545129776
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.027195578441023827
Validation loss = 0.021845625713467598
Validation loss = 0.022277118638157845
Validation loss = 0.022141221910715103
Validation loss = 0.022237908095121384
Validation loss = 0.02171030454337597
Validation loss = 0.022028516978025436
Validation loss = 0.02060522884130478
Validation loss = 0.02318083494901657
Validation loss = 0.022128019481897354
Validation loss = 0.02086496911942959
Validation loss = 0.021093860268592834
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02422071062028408
Validation loss = 0.020828790962696075
Validation loss = 0.021297190338373184
Validation loss = 0.023318596184253693
Validation loss = 0.02178839035332203
Validation loss = 0.022296331822872162
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 735
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 741
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 730
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 739
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 735
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 742
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.06e+03 |
| Iteration     | 15       |
| MaximumReturn | 2.95e+03 |
| MinimumReturn | 673      |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025464260950684547
Validation loss = 0.020884256809949875
Validation loss = 0.021586377173662186
Validation loss = 0.025105159729719162
Validation loss = 0.020786909386515617
Validation loss = 0.02135050855576992
Validation loss = 0.022315150126814842
Validation loss = 0.02085137367248535
Validation loss = 0.023547377437353134
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02247179113328457
Validation loss = 0.019778961315751076
Validation loss = 0.019969306886196136
Validation loss = 0.020707866176962852
Validation loss = 0.019383806735277176
Validation loss = 0.023466194048523903
Validation loss = 0.01837102882564068
Validation loss = 0.018855642527341843
Validation loss = 0.020870644599199295
Validation loss = 0.018881339579820633
Validation loss = 0.01939811371266842
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023635953664779663
Validation loss = 0.021017979830503464
Validation loss = 0.0202659722417593
Validation loss = 0.020121701061725616
Validation loss = 0.019805172458291054
Validation loss = 0.02192932926118374
Validation loss = 0.019029907882213593
Validation loss = 0.020933853462338448
Validation loss = 0.019052080810070038
Validation loss = 0.01943453587591648
Validation loss = 0.02104571834206581
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02494475618004799
Validation loss = 0.020238766446709633
Validation loss = 0.020282549783587456
Validation loss = 0.01989944837987423
Validation loss = 0.019908981397747993
Validation loss = 0.02062218450009823
Validation loss = 0.019595351070165634
Validation loss = 0.023008454591035843
Validation loss = 0.019176403060555458
Validation loss = 0.018790066242218018
Validation loss = 0.021347125992178917
Validation loss = 0.01888388767838478
Validation loss = 0.019923359155654907
Validation loss = 0.01978491060435772
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024111399427056313
Validation loss = 0.02003534510731697
Validation loss = 0.020946111530065536
Validation loss = 0.019988378509879112
Validation loss = 0.02242795191705227
Validation loss = 0.019757578149437904
Validation loss = 0.01986740343272686
Validation loss = 0.02021479792892933
Validation loss = 0.019458215683698654
Validation loss = 0.021807899698615074
Validation loss = 0.01842399686574936
Validation loss = 0.020072711631655693
Validation loss = 0.020224597305059433
Validation loss = 0.018467502668499947
Validation loss = 0.019406873732805252
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 719
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 754
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 706
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 738
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 756
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 728
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.99e+03 |
| Iteration     | 16       |
| MaximumReturn | 3.1e+03  |
| MinimumReturn | 859      |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02501862496137619
Validation loss = 0.020748456940054893
Validation loss = 0.021456848829984665
Validation loss = 0.021760283038020134
Validation loss = 0.01966186799108982
Validation loss = 0.02305472269654274
Validation loss = 0.019288785755634308
Validation loss = 0.02248753234744072
Validation loss = 0.019160263240337372
Validation loss = 0.023819847032427788
Validation loss = 0.01908944360911846
Validation loss = 0.019564442336559296
Validation loss = 0.021596962586045265
Validation loss = 0.019087327644228935
Validation loss = 0.0188701543956995
Validation loss = 0.02078382298350334
Validation loss = 0.01879666931927204
Validation loss = 0.021797828376293182
Validation loss = 0.020419476553797722
Validation loss = 0.01867232844233513
Validation loss = 0.019643263891339302
Validation loss = 0.017981745302677155
Validation loss = 0.019856087863445282
Validation loss = 0.018007151782512665
Validation loss = 0.019615648314356804
Validation loss = 0.0175167266279459
Validation loss = 0.018596820533275604
Validation loss = 0.017628787085413933
Validation loss = 0.018378280103206635
Validation loss = 0.020478205755352974
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022628720849752426
Validation loss = 0.018114876002073288
Validation loss = 0.018910182639956474
Validation loss = 0.02137202024459839
Validation loss = 0.01782720722258091
Validation loss = 0.01856216974556446
Validation loss = 0.020202506333589554
Validation loss = 0.01742910034954548
Validation loss = 0.019107908010482788
Validation loss = 0.018531037494540215
Validation loss = 0.01727418042719364
Validation loss = 0.020522799342870712
Validation loss = 0.018243510276079178
Validation loss = 0.01682218350470066
Validation loss = 0.01877046562731266
Validation loss = 0.01677524857223034
Validation loss = 0.01802963763475418
Validation loss = 0.016949158161878586
Validation loss = 0.017320822924375534
Validation loss = 0.018475186079740524
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021789386868476868
Validation loss = 0.018812891095876694
Validation loss = 0.02101415954530239
Validation loss = 0.01869785040616989
Validation loss = 0.01991519331932068
Validation loss = 0.018838442862033844
Validation loss = 0.017901524901390076
Validation loss = 0.01799326390028
Validation loss = 0.01945662312209606
Validation loss = 0.017508642747998238
Validation loss = 0.01852007955312729
Validation loss = 0.017783736810088158
Validation loss = 0.017592884600162506
Validation loss = 0.017986172810196877
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024377262219786644
Validation loss = 0.018792588263750076
Validation loss = 0.01826709508895874
Validation loss = 0.020481199026107788
Validation loss = 0.017337217926979065
Validation loss = 0.018799133598804474
Validation loss = 0.017637886106967926
Validation loss = 0.018020024523139
Validation loss = 0.019590919837355614
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022344809025526047
Validation loss = 0.017714332789182663
Validation loss = 0.01879989542067051
Validation loss = 0.019224034622311592
Validation loss = 0.020221546292304993
Validation loss = 0.017376720905303955
Validation loss = 0.020847531035542488
Validation loss = 0.017849542200565338
Validation loss = 0.019074074923992157
Validation loss = 0.018619053065776825
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 692
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 731
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 719
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 735
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 756
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 734
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.02e+03 |
| Iteration     | 17       |
| MaximumReturn | 2.98e+03 |
| MinimumReturn | 628      |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02148524485528469
Validation loss = 0.016759807243943214
Validation loss = 0.01824767142534256
Validation loss = 0.018205946311354637
Validation loss = 0.01697748340666294
Validation loss = 0.01937650702893734
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018441015854477882
Validation loss = 0.01628901995718479
Validation loss = 0.016709739342331886
Validation loss = 0.01739627867937088
Validation loss = 0.016499044373631477
Validation loss = 0.017026422545313835
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01995168998837471
Validation loss = 0.01725121960043907
Validation loss = 0.0187706109136343
Validation loss = 0.01682720147073269
Validation loss = 0.016123462468385696
Validation loss = 0.01748228259384632
Validation loss = 0.01797550357878208
Validation loss = 0.01659758761525154
Validation loss = 0.018739691004157066
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021409496665000916
Validation loss = 0.01708955690264702
Validation loss = 0.018145503476262093
Validation loss = 0.018571699038147926
Validation loss = 0.016380803659558296
Validation loss = 0.017716336995363235
Validation loss = 0.016765862703323364
Validation loss = 0.01671708934009075
Validation loss = 0.01728007197380066
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020029233768582344
Validation loss = 0.017434289678931236
Validation loss = 0.017749080434441566
Validation loss = 0.016859104856848717
Validation loss = 0.017866119742393494
Validation loss = 0.01635913737118244
Validation loss = 0.01769006997346878
Validation loss = 0.01796283759176731
Validation loss = 0.0171735268086195
Validation loss = 0.01696135103702545
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 724
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 717
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 744
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 690
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 690
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 714
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.21e+03 |
| Iteration     | 18       |
| MaximumReturn | 2.91e+03 |
| MinimumReturn | -51.6    |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017958596348762512
Validation loss = 0.017772460356354713
Validation loss = 0.01690245233476162
Validation loss = 0.018163669854402542
Validation loss = 0.01604340970516205
Validation loss = 0.01672331988811493
Validation loss = 0.018066998571157455
Validation loss = 0.01669209823012352
Validation loss = 0.016896652057766914
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017937108874320984
Validation loss = 0.016293924301862717
Validation loss = 0.015741264447569847
Validation loss = 0.01635727845132351
Validation loss = 0.015924761071801186
Validation loss = 0.016227062791585922
Validation loss = 0.015363907441496849
Validation loss = 0.016450731083750725
Validation loss = 0.015591690316796303
Validation loss = 0.015657532960176468
Validation loss = 0.016255680471658707
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017336200922727585
Validation loss = 0.01720910333096981
Validation loss = 0.01642705872654915
Validation loss = 0.01596708968281746
Validation loss = 0.01635870337486267
Validation loss = 0.017918361350893974
Validation loss = 0.015600797720253468
Validation loss = 0.017258796840906143
Validation loss = 0.016913743689656258
Validation loss = 0.015621589496731758
Validation loss = 0.01783996820449829
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019598884508013725
Validation loss = 0.01604122295975685
Validation loss = 0.016171758994460106
Validation loss = 0.018413130193948746
Validation loss = 0.016148176044225693
Validation loss = 0.015876377001404762
Validation loss = 0.016490239650011063
Validation loss = 0.01672220788896084
Validation loss = 0.015760237351059914
Validation loss = 0.01632075570523739
Validation loss = 0.016146833077073097
Validation loss = 0.015539047308266163
Validation loss = 0.018339570611715317
Validation loss = 0.015405574813485146
Validation loss = 0.01620771363377571
Validation loss = 0.016095895320177078
Validation loss = 0.014785222709178925
Validation loss = 0.01655060425400734
Validation loss = 0.016084114089608192
Validation loss = 0.015150991268455982
Validation loss = 0.016536489129066467
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018022824078798294
Validation loss = 0.015920093283057213
Validation loss = 0.01641255058348179
Validation loss = 0.016979729756712914
Validation loss = 0.01568879559636116
Validation loss = 0.01706649363040924
Validation loss = 0.015566813759505749
Validation loss = 0.01685621775686741
Validation loss = 0.015736352652311325
Validation loss = 0.016325771808624268
Validation loss = 0.015360862016677856
Validation loss = 0.015719445422291756
Validation loss = 0.01638510823249817
Validation loss = 0.015890972688794136
Validation loss = 0.01526443101465702
Validation loss = 0.01804679073393345
Validation loss = 0.01486311387270689
Validation loss = 0.015681331977248192
Validation loss = 0.014498382806777954
Validation loss = 0.01502795796841383
Validation loss = 0.015293615870177746
Validation loss = 0.015261461026966572
Validation loss = 0.017986256629228592
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 724
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 707
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 692
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 692
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 687
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 714
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.76e+03 |
| Iteration     | 19       |
| MaximumReturn | 3.51e+03 |
| MinimumReturn | 761      |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01826353743672371
Validation loss = 0.015452122315764427
Validation loss = 0.01632840931415558
Validation loss = 0.01564251072704792
Validation loss = 0.01708787865936756
Validation loss = 0.01571647636592388
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01624108850955963
Validation loss = 0.014441145583987236
Validation loss = 0.01597151905298233
Validation loss = 0.015128301456570625
Validation loss = 0.015199639834463596
Validation loss = 0.015062578022480011
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018424328416585922
Validation loss = 0.0145505890250206
Validation loss = 0.01595279574394226
Validation loss = 0.01644000969827175
Validation loss = 0.014597112312912941
Validation loss = 0.017015445977449417
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015810612589120865
Validation loss = 0.014207927510142326
Validation loss = 0.015577808953821659
Validation loss = 0.015401563607156277
Validation loss = 0.014869174920022488
Validation loss = 0.01432888675481081
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018895545974373817
Validation loss = 0.014296084642410278
Validation loss = 0.014422514475882053
Validation loss = 0.01415583211928606
Validation loss = 0.0157912764698267
Validation loss = 0.014535929076373577
Validation loss = 0.014526273123919964
Validation loss = 0.014046955853700638
Validation loss = 0.014413682743906975
Validation loss = 0.015535436570644379
Validation loss = 0.013759722001850605
Validation loss = 0.014097599312663078
Validation loss = 0.014568331651389599
Validation loss = 0.013683070428669453
Validation loss = 0.014444893226027489
Validation loss = 0.014595991000533104
Validation loss = 0.013294003903865814
Validation loss = 0.014008929021656513
Validation loss = 0.016065264120697975
Validation loss = 0.013290375471115112
Validation loss = 0.013796922750771046
Validation loss = 0.015049390494823456
Validation loss = 0.012962940149009228
Validation loss = 0.014234491623938084
Validation loss = 0.014017698355019093
Validation loss = 0.01568128727376461
Validation loss = 0.012937835417687893
Validation loss = 0.014065153896808624
Validation loss = 0.012773576192557812
Validation loss = 0.013830738142132759
Validation loss = 0.01297515444457531
Validation loss = 0.01326675619930029
Validation loss = 0.013005056418478489
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 719
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 670
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 691
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 707
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 689
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 644
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.05e+03 |
| Iteration     | 20       |
| MaximumReturn | 3.6e+03  |
| MinimumReturn | 265      |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016595764085650444
Validation loss = 0.015145138837397099
Validation loss = 0.01704690419137478
Validation loss = 0.014514953829348087
Validation loss = 0.017097843810915947
Validation loss = 0.01430352684110403
Validation loss = 0.016488337889313698
Validation loss = 0.014503362588584423
Validation loss = 0.014780495315790176
Validation loss = 0.014542938210070133
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017206087708473206
Validation loss = 0.01403429452329874
Validation loss = 0.016470924019813538
Validation loss = 0.013596967794001102
Validation loss = 0.01547914370894432
Validation loss = 0.014810211025178432
Validation loss = 0.014870685525238514
Validation loss = 0.013810769654810429
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017195887863636017
Validation loss = 0.014401871711015701
Validation loss = 0.01564846560359001
Validation loss = 0.014548728242516518
Validation loss = 0.014531780034303665
Validation loss = 0.015166872180998325
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015276068821549416
Validation loss = 0.014415547251701355
Validation loss = 0.014947659336030483
Validation loss = 0.014653990045189857
Validation loss = 0.016073964536190033
Validation loss = 0.013864506036043167
Validation loss = 0.013761311769485474
Validation loss = 0.013631954789161682
Validation loss = 0.014105686917901039
Validation loss = 0.014866226352751255
Validation loss = 0.013348923064768314
Validation loss = 0.016259491443634033
Validation loss = 0.013955282978713512
Validation loss = 0.01419151108711958
Validation loss = 0.013835039921104908
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014336567372083664
Validation loss = 0.013169724494218826
Validation loss = 0.013155827298760414
Validation loss = 0.01352128479629755
Validation loss = 0.01296148356050253
Validation loss = 0.013702288269996643
Validation loss = 0.012406057678163052
Validation loss = 0.01674199290573597
Validation loss = 0.012497914023697376
Validation loss = 0.013708824291825294
Validation loss = 0.013343524187803268
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 677
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 681
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 677
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 657
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 689
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 682
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.23e+03 |
| Iteration     | 21       |
| MaximumReturn | 3.64e+03 |
| MinimumReturn | 576      |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015863606706261635
Validation loss = 0.01435870211571455
Validation loss = 0.014211965724825859
Validation loss = 0.015384627506136894
Validation loss = 0.014025921002030373
Validation loss = 0.015712255612015724
Validation loss = 0.013639506883919239
Validation loss = 0.014339331537485123
Validation loss = 0.013483780436217785
Validation loss = 0.014659130945801735
Validation loss = 0.014003380201756954
Validation loss = 0.013253877870738506
Validation loss = 0.014999058097600937
Validation loss = 0.013100508600473404
Validation loss = 0.01338247675448656
Validation loss = 0.016193930059671402
Validation loss = 0.01325721200555563
Validation loss = 0.014993539080023766
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01539380382746458
Validation loss = 0.013691114261746407
Validation loss = 0.014502538368105888
Validation loss = 0.013172106817364693
Validation loss = 0.015064554288983345
Validation loss = 0.013549062423408031
Validation loss = 0.013554885052144527
Validation loss = 0.013412375934422016
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0162209440022707
Validation loss = 0.013384930789470673
Validation loss = 0.014399453066289425
Validation loss = 0.014789458364248276
Validation loss = 0.014319662004709244
Validation loss = 0.014287488535046577
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016636809334158897
Validation loss = 0.013216529041528702
Validation loss = 0.016318943351507187
Validation loss = 0.0132649140432477
Validation loss = 0.013566571287810802
Validation loss = 0.013866769149899483
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0135487737134099
Validation loss = 0.012308894656598568
Validation loss = 0.013662165962159634
Validation loss = 0.012555253691971302
Validation loss = 0.012999712489545345
Validation loss = 0.013085387647151947
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 669
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 660
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 661
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 601
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 661
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 687
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.97e+03 |
| Iteration     | 22       |
| MaximumReturn | 3.59e+03 |
| MinimumReturn | 2.07e+03 |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015209776349365711
Validation loss = 0.01353466883301735
Validation loss = 0.014127139933407307
Validation loss = 0.012922357767820358
Validation loss = 0.012573550455272198
Validation loss = 0.013658869080245495
Validation loss = 0.013430866412818432
Validation loss = 0.012647271156311035
Validation loss = 0.01358041912317276
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013506226241588593
Validation loss = 0.013099764473736286
Validation loss = 0.014030530117452145
Validation loss = 0.012720917351543903
Validation loss = 0.013041265308856964
Validation loss = 0.014300071634352207
Validation loss = 0.012500622309744358
Validation loss = 0.013926055282354355
Validation loss = 0.013750872574746609
Validation loss = 0.01207355409860611
Validation loss = 0.012938134372234344
Validation loss = 0.012642296962440014
Validation loss = 0.014132869429886341
Validation loss = 0.011927399784326553
Validation loss = 0.013169263489544392
Validation loss = 0.01216584350913763
Validation loss = 0.013237439095973969
Validation loss = 0.01271054893732071
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015650218352675438
Validation loss = 0.013790602795779705
Validation loss = 0.014989594928920269
Validation loss = 0.013417857699096203
Validation loss = 0.013242003507912159
Validation loss = 0.014249001629650593
Validation loss = 0.013036929070949554
Validation loss = 0.01631101965904236
Validation loss = 0.012763350270688534
Validation loss = 0.013262114487588406
Validation loss = 0.014452625997364521
Validation loss = 0.012875236570835114
Validation loss = 0.014229821972548962
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014293171465396881
Validation loss = 0.012863821350038052
Validation loss = 0.013037099502980709
Validation loss = 0.012526099570095539
Validation loss = 0.012738951481878757
Validation loss = 0.012691427953541279
Validation loss = 0.012593555264174938
Validation loss = 0.013503770343959332
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01472839992493391
Validation loss = 0.011630993336439133
Validation loss = 0.012262868694961071
Validation loss = 0.012485367245972157
Validation loss = 0.011653979308903217
Validation loss = 0.01308832410722971
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 633
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 635
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 658
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 653
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 687
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 675
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.55e+03 |
| Iteration     | 23       |
| MaximumReturn | 3.69e+03 |
| MinimumReturn | 1.03e+03 |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013911545276641846
Validation loss = 0.012778359465301037
Validation loss = 0.012474401853978634
Validation loss = 0.012407755479216576
Validation loss = 0.012123141437768936
Validation loss = 0.012888294644653797
Validation loss = 0.012052175588905811
Validation loss = 0.011511450633406639
Validation loss = 0.012567359954118729
Validation loss = 0.011500422842800617
Validation loss = 0.01302243210375309
Validation loss = 0.0121926823630929
Validation loss = 0.01221460197120905
Validation loss = 0.011881465092301369
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012451681308448315
Validation loss = 0.013020859099924564
Validation loss = 0.012874855659902096
Validation loss = 0.012067227624356747
Validation loss = 0.013786263763904572
Validation loss = 0.011402599513530731
Validation loss = 0.01322129089385271
Validation loss = 0.011455096304416656
Validation loss = 0.012357383035123348
Validation loss = 0.012786567211151123
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016363894566893578
Validation loss = 0.0123879574239254
Validation loss = 0.014084476977586746
Validation loss = 0.012354063801467419
Validation loss = 0.013533201068639755
Validation loss = 0.013295636512339115
Validation loss = 0.011794112622737885
Validation loss = 0.014517277479171753
Validation loss = 0.01209531631320715
Validation loss = 0.012629824690520763
Validation loss = 0.01168834324926138
Validation loss = 0.012814922258257866
Validation loss = 0.011937865056097507
Validation loss = 0.014311295002698898
Validation loss = 0.01170814037322998
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014192870818078518
Validation loss = 0.012194434180855751
Validation loss = 0.012114077806472778
Validation loss = 0.012596516869962215
Validation loss = 0.011981594376266003
Validation loss = 0.012039968743920326
Validation loss = 0.012042978778481483
Validation loss = 0.012185794301331043
Validation loss = 0.01503780111670494
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013227859511971474
Validation loss = 0.011490872129797935
Validation loss = 0.012417739257216454
Validation loss = 0.01179658155888319
Validation loss = 0.01253423560410738
Validation loss = 0.011326082982122898
Validation loss = 0.01192708220332861
Validation loss = 0.011680442839860916
Validation loss = 0.012231849133968353
Validation loss = 0.011167734861373901
Validation loss = 0.011235944926738739
Validation loss = 0.011960944160819054
Validation loss = 0.011853283271193504
Validation loss = 0.011356350965797901
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 681
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 672
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 666
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 674
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 649
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 664
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.07e+03 |
| Iteration     | 24       |
| MaximumReturn | 3.93e+03 |
| MinimumReturn | 1.33e+03 |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013651590794324875
Validation loss = 0.01124213170260191
Validation loss = 0.013386813923716545
Validation loss = 0.011656300164759159
Validation loss = 0.01378986332565546
Validation loss = 0.011578856967389584
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012425737455487251
Validation loss = 0.011668211780488491
Validation loss = 0.012399555183947086
Validation loss = 0.011497029103338718
Validation loss = 0.012050681747496128
Validation loss = 0.011657820083200932
Validation loss = 0.012308402918279171
Validation loss = 0.011561691761016846
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013259464874863625
Validation loss = 0.011573472991585732
Validation loss = 0.012172400020062923
Validation loss = 0.011206657625734806
Validation loss = 0.012129256501793861
Validation loss = 0.013255835510790348
Validation loss = 0.011504976078867912
Validation loss = 0.012094927951693535
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014800654724240303
Validation loss = 0.011533200740814209
Validation loss = 0.013116404414176941
Validation loss = 0.011500920169055462
Validation loss = 0.01294585969299078
Validation loss = 0.01169351302087307
Validation loss = 0.011358613148331642
Validation loss = 0.011388243176043034
Validation loss = 0.01205054298043251
Validation loss = 0.011549851857125759
Validation loss = 0.011988954618573189
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013639801181852818
Validation loss = 0.011249080300331116
Validation loss = 0.012285546399652958
Validation loss = 0.010836729779839516
Validation loss = 0.012147670611739159
Validation loss = 0.011003182269632816
Validation loss = 0.011526311747729778
Validation loss = 0.01051425002515316
Validation loss = 0.011838773265480995
Validation loss = 0.01110727246850729
Validation loss = 0.010963251814246178
Validation loss = 0.011732007376849651
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 675
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 681
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 681
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 648
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 681
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 665
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.43e+03 |
| Iteration     | 25       |
| MaximumReturn | 3.69e+03 |
| MinimumReturn | 640      |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011689267121255398
Validation loss = 0.011761127039790154
Validation loss = 0.011908270418643951
Validation loss = 0.011301649734377861
Validation loss = 0.012018157169222832
Validation loss = 0.011586779728531837
Validation loss = 0.011341090314090252
Validation loss = 0.011744861491024494
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012693981640040874
Validation loss = 0.010942875407636166
Validation loss = 0.011485897935926914
Validation loss = 0.011306428350508213
Validation loss = 0.011179059743881226
Validation loss = 0.012708299793303013
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01312203798443079
Validation loss = 0.01133506465703249
Validation loss = 0.011232305318117142
Validation loss = 0.011304386891424656
Validation loss = 0.011131332255899906
Validation loss = 0.011946814134716988
Validation loss = 0.011117013171315193
Validation loss = 0.012411700561642647
Validation loss = 0.011645019054412842
Validation loss = 0.01129899825900793
Validation loss = 0.011585193686187267
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012108957394957542
Validation loss = 0.011622406542301178
Validation loss = 0.01194196380674839
Validation loss = 0.011218859814107418
Validation loss = 0.011260515078902245
Validation loss = 0.011211018078029156
Validation loss = 0.011763776652514935
Validation loss = 0.010561523959040642
Validation loss = 0.011762558482587337
Validation loss = 0.011300706304609776
Validation loss = 0.01053638570010662
Validation loss = 0.01120423898100853
Validation loss = 0.011133174411952496
Validation loss = 0.011440383270382881
Validation loss = 0.011171383783221245
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012037917040288448
Validation loss = 0.010315037332475185
Validation loss = 0.010978961363434792
Validation loss = 0.010660636238753796
Validation loss = 0.011195859871804714
Validation loss = 0.010935770347714424
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 666
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 675
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 651
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 637
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 654
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 633
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.36e+03 |
| Iteration     | 26       |
| MaximumReturn | 4.02e+03 |
| MinimumReturn | 362      |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012033994309604168
Validation loss = 0.011342482641339302
Validation loss = 0.011465870775282383
Validation loss = 0.010717527940869331
Validation loss = 0.01169152744114399
Validation loss = 0.01085368450731039
Validation loss = 0.010941735468804836
Validation loss = 0.010710855014622211
Validation loss = 0.011363792233169079
Validation loss = 0.011017723940312862
Validation loss = 0.0115934694185853
Validation loss = 0.01111575961112976
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01281761471182108
Validation loss = 0.011145681142807007
Validation loss = 0.01062602736055851
Validation loss = 0.01101686991751194
Validation loss = 0.011566625908017159
Validation loss = 0.012126185931265354
Validation loss = 0.010719776153564453
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011695057153701782
Validation loss = 0.011927732266485691
Validation loss = 0.010840325616300106
Validation loss = 0.011338068172335625
Validation loss = 0.010478614829480648
Validation loss = 0.011500632390379906
Validation loss = 0.011059022508561611
Validation loss = 0.010364036075770855
Validation loss = 0.010946464724838734
Validation loss = 0.011092431843280792
Validation loss = 0.011331881396472454
Validation loss = 0.010171153582632542
Validation loss = 0.01168365590274334
Validation loss = 0.010705274529755116
Validation loss = 0.011232593096792698
Validation loss = 0.01039881445467472
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010899506509304047
Validation loss = 0.010784653946757317
Validation loss = 0.012163099832832813
Validation loss = 0.010344460606575012
Validation loss = 0.011357507668435574
Validation loss = 0.010782997123897076
Validation loss = 0.010713689960539341
Validation loss = 0.010894976556301117
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011381168849766254
Validation loss = 0.011234669014811516
Validation loss = 0.011614123359322548
Validation loss = 0.010337469168007374
Validation loss = 0.011529500596225262
Validation loss = 0.010565713047981262
Validation loss = 0.010364711284637451
Validation loss = 0.010616155341267586
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 664
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 683
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 687
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 672
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 668
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 682
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.36e+03 |
| Iteration     | 27       |
| MaximumReturn | 3.66e+03 |
| MinimumReturn | 721      |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011137458495795727
Validation loss = 0.010890243574976921
Validation loss = 0.010606805793941021
Validation loss = 0.01103238482028246
Validation loss = 0.010244360193610191
Validation loss = 0.011449119076132774
Validation loss = 0.009870369918644428
Validation loss = 0.010964391753077507
Validation loss = 0.010395491495728493
Validation loss = 0.011319179087877274
Validation loss = 0.011332642287015915
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011648685671389103
Validation loss = 0.011035996489226818
Validation loss = 0.011083893477916718
Validation loss = 0.011185205541551113
Validation loss = 0.010968363843858242
Validation loss = 0.010895133018493652
Validation loss = 0.010532855056226254
Validation loss = 0.01133495569229126
Validation loss = 0.010591110214591026
Validation loss = 0.010699475184082985
Validation loss = 0.01040344312787056
Validation loss = 0.010987132787704468
Validation loss = 0.010567920282483101
Validation loss = 0.011244055815041065
Validation loss = 0.009973960928618908
Validation loss = 0.0107681043446064
Validation loss = 0.010720889084041119
Validation loss = 0.010424662381410599
Validation loss = 0.010956388898193836
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01212916150689125
Validation loss = 0.011362899094820023
Validation loss = 0.010492635890841484
Validation loss = 0.010987204499542713
Validation loss = 0.010982993990182877
Validation loss = 0.01069934107363224
Validation loss = 0.01003985945135355
Validation loss = 0.01093524694442749
Validation loss = 0.010395890101790428
Validation loss = 0.011667891405522823
Validation loss = 0.010707746259868145
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011002692393958569
Validation loss = 0.011174377985298634
Validation loss = 0.011250174604356289
Validation loss = 0.010549642145633698
Validation loss = 0.010752309113740921
Validation loss = 0.01035041268914938
Validation loss = 0.010730954818427563
Validation loss = 0.010607268661260605
Validation loss = 0.010551270097494125
Validation loss = 0.01128864660859108
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010851687751710415
Validation loss = 0.010226259008049965
Validation loss = 0.011332339607179165
Validation loss = 0.010148309171199799
Validation loss = 0.010352663695812225
Validation loss = 0.011085070669651031
Validation loss = 0.010988318361341953
Validation loss = 0.01005483791232109
Validation loss = 0.0103422487154603
Validation loss = 0.010426723398268223
Validation loss = 0.009994503110647202
Validation loss = 0.01057569868862629
Validation loss = 0.010450860485434532
Validation loss = 0.011146296747028828
Validation loss = 0.01061368640512228
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 678
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 669
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 651
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 531
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 538
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 670
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.68e+03 |
| Iteration     | 28       |
| MaximumReturn | 3.97e+03 |
| MinimumReturn | 483      |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011441749520599842
Validation loss = 0.010118232108652592
Validation loss = 0.010317535139620304
Validation loss = 0.011048831976950169
Validation loss = 0.011042827740311623
Validation loss = 0.010477859526872635
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0111917769536376
Validation loss = 0.01029916387051344
Validation loss = 0.010238345712423325
Validation loss = 0.010609103366732597
Validation loss = 0.01022156048566103
Validation loss = 0.010263236239552498
Validation loss = 0.009934576228260994
Validation loss = 0.010674680583178997
Validation loss = 0.009575104340910912
Validation loss = 0.011570440605282784
Validation loss = 0.009942482225596905
Validation loss = 0.011412023566663265
Validation loss = 0.010280320420861244
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011211971752345562
Validation loss = 0.009856906719505787
Validation loss = 0.01094853226095438
Validation loss = 0.010705601423978806
Validation loss = 0.009707710705697536
Validation loss = 0.010578269138932228
Validation loss = 0.01047712005674839
Validation loss = 0.009951882995665073
Validation loss = 0.010606730356812477
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012078959494829178
Validation loss = 0.00965326838195324
Validation loss = 0.010596531443297863
Validation loss = 0.01082857046276331
Validation loss = 0.01002851314842701
Validation loss = 0.010258079506456852
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011680403724312782
Validation loss = 0.010062603279948235
Validation loss = 0.010271073319017887
Validation loss = 0.010919510386884212
Validation loss = 0.010349679738283157
Validation loss = 0.010247203521430492
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 662
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 679
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 674
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 640
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 668
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 669
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.75e+03 |
| Iteration     | 29       |
| MaximumReturn | 3.39e+03 |
| MinimumReturn | 187      |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012855091132223606
Validation loss = 0.010138236917555332
Validation loss = 0.011708180420100689
Validation loss = 0.009642127901315689
Validation loss = 0.010117524303495884
Validation loss = 0.009909307584166527
Validation loss = 0.010419469326734543
Validation loss = 0.010882746428251266
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01076093316078186
Validation loss = 0.009772449731826782
Validation loss = 0.011105380952358246
Validation loss = 0.009959060698747635
Validation loss = 0.00971924141049385
Validation loss = 0.010403607971966267
Validation loss = 0.00968960765749216
Validation loss = 0.010868874378502369
Validation loss = 0.009362047538161278
Validation loss = 0.010759166441857815
Validation loss = 0.009555277414619923
Validation loss = 0.009929333813488483
Validation loss = 0.009426760487258434
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011331669054925442
Validation loss = 0.009768236428499222
Validation loss = 0.010038637556135654
Validation loss = 0.009961373172700405
Validation loss = 0.009830962866544724
Validation loss = 0.01188688725233078
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010800749063491821
Validation loss = 0.010515114292502403
Validation loss = 0.010366255417466164
Validation loss = 0.009729800745844841
Validation loss = 0.011249987408518791
Validation loss = 0.010262765921652317
Validation loss = 0.00974118709564209
Validation loss = 0.01034330390393734
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010374577715992928
Validation loss = 0.010413317009806633
Validation loss = 0.010847869329154491
Validation loss = 0.009875530377030373
Validation loss = 0.011152874678373337
Validation loss = 0.009481408633291721
Validation loss = 0.010005357675254345
Validation loss = 0.009983252733945847
Validation loss = 0.00991552509367466
Validation loss = 0.009993469342589378
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 672
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 675
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 675
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 694
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 677
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 678
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.24e+03 |
| Iteration     | 30       |
| MaximumReturn | 3.9e+03  |
| MinimumReturn | 2.08e+03 |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010568194091320038
Validation loss = 0.009572084993124008
Validation loss = 0.010297724045813084
Validation loss = 0.009531514719128609
Validation loss = 0.00984262302517891
Validation loss = 0.009765450842678547
Validation loss = 0.01048910804092884
Validation loss = 0.009181873872876167
Validation loss = 0.010627410374581814
Validation loss = 0.009528644382953644
Validation loss = 0.009927976876497269
Validation loss = 0.009901318699121475
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01014798041433096
Validation loss = 0.009301820769906044
Validation loss = 0.009872611612081528
Validation loss = 0.010535291396081448
Validation loss = 0.010437030345201492
Validation loss = 0.009469080716371536
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010258969850838184
Validation loss = 0.009531421586871147
Validation loss = 0.010701419785618782
Validation loss = 0.009286065585911274
Validation loss = 0.009348267689347267
Validation loss = 0.009410519152879715
Validation loss = 0.009757205843925476
Validation loss = 0.009671784937381744
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010199027135968208
Validation loss = 0.009356790222227573
Validation loss = 0.00974019430577755
Validation loss = 0.010634508915245533
Validation loss = 0.010152788832783699
Validation loss = 0.010025604628026485
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010789720341563225
Validation loss = 0.00943734310567379
Validation loss = 0.010204828344285488
Validation loss = 0.00993033591657877
Validation loss = 0.010157112963497639
Validation loss = 0.010021049529314041
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 658
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 660
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 638
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 684
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 649
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 664
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.58e+03 |
| Iteration     | 31       |
| MaximumReturn | 3.7e+03  |
| MinimumReturn | 298      |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010386211797595024
Validation loss = 0.009759880602359772
Validation loss = 0.01024706196039915
Validation loss = 0.008950500749051571
Validation loss = 0.009422880597412586
Validation loss = 0.009994530119001865
Validation loss = 0.009736425243318081
Validation loss = 0.009940734133124352
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010484726168215275
Validation loss = 0.009742453694343567
Validation loss = 0.009657146409153938
Validation loss = 0.009883088991045952
Validation loss = 0.009440716356039047
Validation loss = 0.010100594721734524
Validation loss = 0.009632134810090065
Validation loss = 0.009349843487143517
Validation loss = 0.010446686297655106
Validation loss = 0.009567780420184135
Validation loss = 0.009622286073863506
Validation loss = 0.009747669100761414
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00958542712032795
Validation loss = 0.009619380347430706
Validation loss = 0.009955253452062607
Validation loss = 0.009722353890538216
Validation loss = 0.009863889776170254
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010877790860831738
Validation loss = 0.009591415524482727
Validation loss = 0.009894167073071003
Validation loss = 0.009635314345359802
Validation loss = 0.009523516520857811
Validation loss = 0.0097206337377429
Validation loss = 0.009330926463007927
Validation loss = 0.0098617784678936
Validation loss = 0.009264146909117699
Validation loss = 0.009827014990150928
Validation loss = 0.00960154365748167
Validation loss = 0.009859238751232624
Validation loss = 0.00990419089794159
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009601974859833717
Validation loss = 0.010003951378166676
Validation loss = 0.00988142378628254
Validation loss = 0.009529227390885353
Validation loss = 0.010293126106262207
Validation loss = 0.009514560922980309
Validation loss = 0.00978876929730177
Validation loss = 0.009903420694172382
Validation loss = 0.009463542141020298
Validation loss = 0.009510413743555546
Validation loss = 0.009129568003118038
Validation loss = 0.01017123181372881
Validation loss = 0.009975387714803219
Validation loss = 0.00984928198158741
Validation loss = 0.00927456934005022
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 674
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 664
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 667
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 553
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 670
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 671
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.43e+03 |
| Iteration     | 32       |
| MaximumReturn | 4.08e+03 |
| MinimumReturn | 1.17e+03 |
| TotalSamples  | 136000   |
----------------------------
