Logging to experiments/gym_fswimmer/nov4/Sw350e1_seed2631
Print configuration .....
{'env_name': 'gym_fswimmer', 'random_seeds': [2312, 1231, 2631, 5543], 'save_variables': False, 'model_save_dir': '/tmp/gym_fswimmer_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'intrinsic_reward_only': False, 'external_reward_evaluation_interval': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [32, 32], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 200, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'trpo_ext_reward': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3906736373901367
Validation loss = 0.1681467592716217
Validation loss = 0.11734993010759354
Validation loss = 0.0947943925857544
Validation loss = 0.08934176713228226
Validation loss = 0.08753489702939987
Validation loss = 0.0829249918460846
Validation loss = 0.0787937194108963
Validation loss = 0.07779901474714279
Validation loss = 0.07380864024162292
Validation loss = 0.07512957602739334
Validation loss = 0.06923206150531769
Validation loss = 0.07855424284934998
Validation loss = 0.07182243466377258
Validation loss = 0.0725712776184082
Validation loss = 0.0737350732088089
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.341158002614975
Validation loss = 0.16965211927890778
Validation loss = 0.12223801016807556
Validation loss = 0.09514684975147247
Validation loss = 0.08535957336425781
Validation loss = 0.08463777601718903
Validation loss = 0.07842697203159332
Validation loss = 0.08666197210550308
Validation loss = 0.07888954877853394
Validation loss = 0.07748903334140778
Validation loss = 0.07419842481613159
Validation loss = 0.07835333794355392
Validation loss = 0.07370582967996597
Validation loss = 0.07303565740585327
Validation loss = 0.07075104862451553
Validation loss = 0.07555274665355682
Validation loss = 0.07041922211647034
Validation loss = 0.07545101642608643
Validation loss = 0.07608647644519806
Validation loss = 0.06891777366399765
Validation loss = 0.07108324766159058
Validation loss = 0.07098338007926941
Validation loss = 0.06865289807319641
Validation loss = 0.0714813843369484
Validation loss = 0.07241208106279373
Validation loss = 0.06710490584373474
Validation loss = 0.07161247730255127
Validation loss = 0.06653229892253876
Validation loss = 0.06493628770112991
Validation loss = 0.07095818221569061
Validation loss = 0.06619232892990112
Validation loss = 0.06789501756429672
Validation loss = 0.06994061917066574
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.36417803168296814
Validation loss = 0.18027374148368835
Validation loss = 0.12658938765525818
Validation loss = 0.10145643353462219
Validation loss = 0.0905749499797821
Validation loss = 0.09120765328407288
Validation loss = 0.0788349136710167
Validation loss = 0.08020550012588501
Validation loss = 0.07636506855487823
Validation loss = 0.07346248626708984
Validation loss = 0.07543277740478516
Validation loss = 0.0735018253326416
Validation loss = 0.08834592252969742
Validation loss = 0.07424771785736084
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3580286204814911
Validation loss = 0.15880267322063446
Validation loss = 0.10972440242767334
Validation loss = 0.08926352113485336
Validation loss = 0.08884736150503159
Validation loss = 0.08062785863876343
Validation loss = 0.08077774196863174
Validation loss = 0.07582645118236542
Validation loss = 0.08013045787811279
Validation loss = 0.07314058393239975
Validation loss = 0.08114967495203018
Validation loss = 0.07348337024450302
Validation loss = 0.08416008949279785
Validation loss = 0.07428068667650223
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4037889242172241
Validation loss = 0.17397256195545197
Validation loss = 0.12171436846256256
Validation loss = 0.09917449951171875
Validation loss = 0.08784715831279755
Validation loss = 0.08002650737762451
Validation loss = 0.07744718343019485
Validation loss = 0.07400494813919067
Validation loss = 0.07226426899433136
Validation loss = 0.0727466344833374
Validation loss = 0.07849788665771484
Validation loss = 0.07585816085338593
Validation loss = 0.069499671459198
Validation loss = 0.07223834097385406
Validation loss = 0.07141786813735962
Validation loss = 0.07365047931671143
Validation loss = 0.07466185837984085
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 1
average number of affinization = 0.14285714285714285
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 6
average number of affinization = 0.875
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 4
average number of affinization = 1.2222222222222223
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 7
average number of affinization = 1.8
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 6
average number of affinization = 2.1818181818181817
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 4
average number of affinization = 2.3333333333333335
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 69.3     |
| Iteration     | 0        |
| MaximumReturn | 107      |
| MinimumReturn | 51.3     |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07116542756557465
Validation loss = 0.037545837461948395
Validation loss = 0.03494950756430626
Validation loss = 0.03935609012842178
Validation loss = 0.0389736145734787
Validation loss = 0.03346073627471924
Validation loss = 0.03349565714597702
Validation loss = 0.03397499769926071
Validation loss = 0.03234519809484482
Validation loss = 0.030637705698609352
Validation loss = 0.03222215548157692
Validation loss = 0.03946836665272713
Validation loss = 0.030652014538645744
Validation loss = 0.03015502355992794
Validation loss = 0.03434024006128311
Validation loss = 0.030728738754987717
Validation loss = 0.03218716382980347
Validation loss = 0.030093491077423096
Validation loss = 0.030786197632551193
Validation loss = 0.031231055036187172
Validation loss = 0.029290223494172096
Validation loss = 0.03259114921092987
Validation loss = 0.03133539482951164
Validation loss = 0.03040868230164051
Validation loss = 0.028757916763424873
Validation loss = 0.02999943122267723
Validation loss = 0.02984461933374405
Validation loss = 0.027514075860381126
Validation loss = 0.035225145518779755
Validation loss = 0.035589225590229034
Validation loss = 0.02853597328066826
Validation loss = 0.027146823704242706
Validation loss = 0.030652819201350212
Validation loss = 0.02705361321568489
Validation loss = 0.027862943708896637
Validation loss = 0.030250951647758484
Validation loss = 0.032068006694316864
Validation loss = 0.027271004393696785
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09698998183012009
Validation loss = 0.04098696634173393
Validation loss = 0.034498199820518494
Validation loss = 0.034102458506822586
Validation loss = 0.030899623408913612
Validation loss = 0.031574051827192307
Validation loss = 0.03196859732270241
Validation loss = 0.030055303126573563
Validation loss = 0.03166085109114647
Validation loss = 0.029017163440585136
Validation loss = 0.03269640728831291
Validation loss = 0.0314740315079689
Validation loss = 0.02868480235338211
Validation loss = 0.029212621971964836
Validation loss = 0.02985653094947338
Validation loss = 0.028878677636384964
Validation loss = 0.030851393938064575
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06851612031459808
Validation loss = 0.037884414196014404
Validation loss = 0.035363323986530304
Validation loss = 0.03493277728557587
Validation loss = 0.034290559589862823
Validation loss = 0.03359372541308403
Validation loss = 0.03518731892108917
Validation loss = 0.033413324505090714
Validation loss = 0.034504201263189316
Validation loss = 0.031257178634405136
Validation loss = 0.03321506828069687
Validation loss = 0.0306724663823843
Validation loss = 0.031131407245993614
Validation loss = 0.03245827183127403
Validation loss = 0.030966129153966904
Validation loss = 0.030517015606164932
Validation loss = 0.031279344111680984
Validation loss = 0.029055241495370865
Validation loss = 0.031288132071495056
Validation loss = 0.02907392755150795
Validation loss = 0.02858012355864048
Validation loss = 0.02989904396235943
Validation loss = 0.028605980798602104
Validation loss = 0.03111753985285759
Validation loss = 0.02840653620660305
Validation loss = 0.03132316470146179
Validation loss = 0.027651406824588776
Validation loss = 0.027733709663152695
Validation loss = 0.03007815219461918
Validation loss = 0.026910539716482162
Validation loss = 0.028754685074090958
Validation loss = 0.027407517656683922
Validation loss = 0.030965253710746765
Validation loss = 0.0297881830483675
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07116909325122833
Validation loss = 0.038119491189718246
Validation loss = 0.03772563859820366
Validation loss = 0.03575405851006508
Validation loss = 0.03477625921368599
Validation loss = 0.032401274889707565
Validation loss = 0.038428984582424164
Validation loss = 0.03096352331340313
Validation loss = 0.03409600257873535
Validation loss = 0.03272078558802605
Validation loss = 0.031309064477682114
Validation loss = 0.030976444482803345
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06953734159469604
Validation loss = 0.03707566484808922
Validation loss = 0.0343121737241745
Validation loss = 0.03514070808887482
Validation loss = 0.03900691121816635
Validation loss = 0.033769287168979645
Validation loss = 0.033930130302906036
Validation loss = 0.03268665820360184
Validation loss = 0.03275704383850098
Validation loss = 0.031171495094895363
Validation loss = 0.032568346709012985
Validation loss = 0.03257444128394127
Validation loss = 0.03207562863826752
Validation loss = 0.029452750459313393
Validation loss = 0.03121226280927658
Validation loss = 0.029286010190844536
Validation loss = 0.02926778793334961
Validation loss = 0.029315222054719925
Validation loss = 0.03132747486233711
Validation loss = 0.03427518159151077
Validation loss = 0.028506455942988396
Validation loss = 0.0436154305934906
Validation loss = 0.028284642845392227
Validation loss = 0.03272206336259842
Validation loss = 0.02778608724474907
Validation loss = 0.03126291558146477
Validation loss = 0.02877851016819477
Validation loss = 0.028607621788978577
Validation loss = 0.028308207169175148
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 1
average number of affinization = 2.230769230769231
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 0
average number of affinization = 2.0714285714285716
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 4
average number of affinization = 2.2
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 6
average number of affinization = 2.4375
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 3
average number of affinization = 2.4705882352941178
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 2
average number of affinization = 2.4444444444444446
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 321      |
| Iteration     | 1        |
| MaximumReturn | 337      |
| MinimumReturn | 308      |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06293506175279617
Validation loss = 0.025727279484272003
Validation loss = 0.021909167990088463
Validation loss = 0.02073783613741398
Validation loss = 0.020764997228980064
Validation loss = 0.02013517916202545
Validation loss = 0.02187287248671055
Validation loss = 0.019797122105956078
Validation loss = 0.02144758589565754
Validation loss = 0.021323682740330696
Validation loss = 0.01948506571352482
Validation loss = 0.020255165174603462
Validation loss = 0.0209623072296381
Validation loss = 0.01863267458975315
Validation loss = 0.021132612600922585
Validation loss = 0.01979239471256733
Validation loss = 0.020124582573771477
Validation loss = 0.02032211795449257
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04029109701514244
Validation loss = 0.02216142974793911
Validation loss = 0.021192358806729317
Validation loss = 0.021201981231570244
Validation loss = 0.02102457731962204
Validation loss = 0.020098472014069557
Validation loss = 0.021583087742328644
Validation loss = 0.021232174709439278
Validation loss = 0.020858094096183777
Validation loss = 0.020931506529450417
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05101234093308449
Validation loss = 0.022382641211152077
Validation loss = 0.020473551005125046
Validation loss = 0.019928503781557083
Validation loss = 0.020377015694975853
Validation loss = 0.020238524302840233
Validation loss = 0.019797980785369873
Validation loss = 0.019688349217176437
Validation loss = 0.019897550344467163
Validation loss = 0.021523067727684975
Validation loss = 0.01960592158138752
Validation loss = 0.019969673827290535
Validation loss = 0.01881721429526806
Validation loss = 0.018913796171545982
Validation loss = 0.01934164948761463
Validation loss = 0.019866934046149254
Validation loss = 0.0180036760866642
Validation loss = 0.019906604662537575
Validation loss = 0.018710816279053688
Validation loss = 0.01993848942220211
Validation loss = 0.018231038004159927
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04258221387863159
Validation loss = 0.02662159688770771
Validation loss = 0.02233433909714222
Validation loss = 0.022833069786429405
Validation loss = 0.021307574585080147
Validation loss = 0.021670645102858543
Validation loss = 0.021466240286827087
Validation loss = 0.02105765789747238
Validation loss = 0.020633460953831673
Validation loss = 0.029081301763653755
Validation loss = 0.021899057552218437
Validation loss = 0.02176162414252758
Validation loss = 0.020905544981360435
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04389322176575661
Validation loss = 0.023768730461597443
Validation loss = 0.020689530298113823
Validation loss = 0.020501168444752693
Validation loss = 0.02292371541261673
Validation loss = 0.023890605196356773
Validation loss = 0.022017115727066994
Validation loss = 0.020675549283623695
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 1
average number of affinization = 2.3684210526315788
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 3
average number of affinization = 2.4
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 1
average number of affinization = 2.3333333333333335
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 3
average number of affinization = 2.3636363636363638
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 6
average number of affinization = 2.5217391304347827
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 1
average number of affinization = 2.4583333333333335
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 323      |
| Iteration     | 2        |
| MaximumReturn | 336      |
| MinimumReturn | 310      |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02011396735906601
Validation loss = 0.017512014135718346
Validation loss = 0.01688564568758011
Validation loss = 0.0166022889316082
Validation loss = 0.016237715259194374
Validation loss = 0.01557234488427639
Validation loss = 0.015353599563241005
Validation loss = 0.01621442288160324
Validation loss = 0.015733866021037102
Validation loss = 0.015558573417365551
Validation loss = 0.01637480966746807
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021292582154273987
Validation loss = 0.016408849507570267
Validation loss = 0.01759576052427292
Validation loss = 0.015970610082149506
Validation loss = 0.01545709278434515
Validation loss = 0.017670582979917526
Validation loss = 0.017302008345723152
Validation loss = 0.01575455628335476
Validation loss = 0.015818841755390167
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022255323827266693
Validation loss = 0.017310496419668198
Validation loss = 0.015677545219659805
Validation loss = 0.01565994694828987
Validation loss = 0.0161590538918972
Validation loss = 0.014939509332180023
Validation loss = 0.015940267592668533
Validation loss = 0.01522899605333805
Validation loss = 0.0175292007625103
Validation loss = 0.01569499634206295
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019242649897933006
Validation loss = 0.01723005622625351
Validation loss = 0.01869487762451172
Validation loss = 0.01681743748486042
Validation loss = 0.016027307137846947
Validation loss = 0.01784094050526619
Validation loss = 0.017364880070090294
Validation loss = 0.01700337789952755
Validation loss = 0.02311447635293007
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02183808945119381
Validation loss = 0.016094891354441643
Validation loss = 0.02040586993098259
Validation loss = 0.015994686633348465
Validation loss = 0.016911491751670837
Validation loss = 0.015979226678609848
Validation loss = 0.01727629452943802
Validation loss = 0.017470411956310272
Validation loss = 0.015908261761069298
Validation loss = 0.01723635196685791
Validation loss = 0.015312370844185352
Validation loss = 0.016916794702410698
Validation loss = 0.016003845259547234
Validation loss = 0.016310647130012512
Validation loss = 0.018041931092739105
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 6
average number of affinization = 2.6
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 0
average number of affinization = 2.5
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 3
average number of affinization = 2.5185185185185186
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 7
average number of affinization = 2.6785714285714284
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 7
average number of affinization = 2.8275862068965516
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 0
average number of affinization = 2.7333333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 295      |
| Iteration     | 3        |
| MaximumReturn | 306      |
| MinimumReturn | 285      |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02065298706293106
Validation loss = 0.014084337279200554
Validation loss = 0.015492659993469715
Validation loss = 0.014297393150627613
Validation loss = 0.013415304012596607
Validation loss = 0.014901265501976013
Validation loss = 0.014595067128539085
Validation loss = 0.013458134606480598
Validation loss = 0.013073772192001343
Validation loss = 0.014089199714362621
Validation loss = 0.012604683637619019
Validation loss = 0.013997865840792656
Validation loss = 0.013407190330326557
Validation loss = 0.015783730894327164
Validation loss = 0.013599196448922157
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018284913152456284
Validation loss = 0.01404415350407362
Validation loss = 0.014018510468304157
Validation loss = 0.0141859520226717
Validation loss = 0.01335990708321333
Validation loss = 0.018922053277492523
Validation loss = 0.013985681347548962
Validation loss = 0.013977034017443657
Validation loss = 0.013598163612186909
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019258365035057068
Validation loss = 0.01273973099887371
Validation loss = 0.014334100298583508
Validation loss = 0.014633579179644585
Validation loss = 0.0136652123183012
Validation loss = 0.013910269364714622
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018076442182064056
Validation loss = 0.015095529146492481
Validation loss = 0.013836394064128399
Validation loss = 0.014336904510855675
Validation loss = 0.01581800915300846
Validation loss = 0.015127201564610004
Validation loss = 0.016077149659395218
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023533139377832413
Validation loss = 0.013735641725361347
Validation loss = 0.013988072052598
Validation loss = 0.01527878362685442
Validation loss = 0.01404117327183485
Validation loss = 0.015763577073812485
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 3
average number of affinization = 2.7419354838709675
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 12
average number of affinization = 3.03125
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 6
average number of affinization = 3.121212121212121
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 8
average number of affinization = 3.264705882352941
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 3
average number of affinization = 3.257142857142857
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 2
average number of affinization = 3.2222222222222223
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 303      |
| Iteration     | 4        |
| MaximumReturn | 309      |
| MinimumReturn | 295      |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013498620130121708
Validation loss = 0.013017326593399048
Validation loss = 0.01294178981333971
Validation loss = 0.011141362600028515
Validation loss = 0.011910107918083668
Validation loss = 0.012011985294520855
Validation loss = 0.01166902482509613
Validation loss = 0.012434267438948154
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012521989643573761
Validation loss = 0.012914818711578846
Validation loss = 0.013999599032104015
Validation loss = 0.01289620902389288
Validation loss = 0.012437508441507816
Validation loss = 0.012809108942747116
Validation loss = 0.012598272413015366
Validation loss = 0.012378034181892872
Validation loss = 0.012521792203187943
Validation loss = 0.013224349357187748
Validation loss = 0.0129704549908638
Validation loss = 0.013127257116138935
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01483609527349472
Validation loss = 0.0116463927552104
Validation loss = 0.011583644896745682
Validation loss = 0.013775154948234558
Validation loss = 0.01267136912792921
Validation loss = 0.012010161764919758
Validation loss = 0.011710093356668949
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014805818907916546
Validation loss = 0.011857785284519196
Validation loss = 0.013327586464583874
Validation loss = 0.012351254932582378
Validation loss = 0.012719918973743916
Validation loss = 0.012518680654466152
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014092985540628433
Validation loss = 0.0128890760242939
Validation loss = 0.011901803314685822
Validation loss = 0.011733599938452244
Validation loss = 0.011857771314680576
Validation loss = 0.011993668042123318
Validation loss = 0.013118474744260311
Validation loss = 0.01378809567540884
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 3
average number of affinization = 3.2162162162162162
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 10
average number of affinization = 3.3947368421052633
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 2
average number of affinization = 3.358974358974359
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 6
average number of affinization = 3.425
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 1
average number of affinization = 3.3658536585365852
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 13
average number of affinization = 3.5952380952380953
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 302      |
| Iteration     | 5        |
| MaximumReturn | 307      |
| MinimumReturn | 293      |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012299403548240662
Validation loss = 0.013181991875171661
Validation loss = 0.011088230647146702
Validation loss = 0.011668634600937366
Validation loss = 0.011791597120463848
Validation loss = 0.012453923001885414
Validation loss = 0.012094726786017418
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01181787345558405
Validation loss = 0.011921918950974941
Validation loss = 0.011121219024062157
Validation loss = 0.01360571663826704
Validation loss = 0.010962597094476223
Validation loss = 0.011045168153941631
Validation loss = 0.011472013778984547
Validation loss = 0.011184302158653736
Validation loss = 0.010412385687232018
Validation loss = 0.011329053901135921
Validation loss = 0.010722705163061619
Validation loss = 0.011836458928883076
Validation loss = 0.010453091003000736
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011993602849543095
Validation loss = 0.012954600155353546
Validation loss = 0.011051543056964874
Validation loss = 0.011189279146492481
Validation loss = 0.01362215168774128
Validation loss = 0.011357160285115242
Validation loss = 0.011570030823349953
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01272182073444128
Validation loss = 0.012094832956790924
Validation loss = 0.013431723229587078
Validation loss = 0.011671396903693676
Validation loss = 0.012063086032867432
Validation loss = 0.011121046729385853
Validation loss = 0.013200303539633751
Validation loss = 0.012788801454007626
Validation loss = 0.01092341635376215
Validation loss = 0.011650865897536278
Validation loss = 0.012450872920453548
Validation loss = 0.011454692110419273
Validation loss = 0.012247366830706596
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013109071180224419
Validation loss = 0.012872440740466118
Validation loss = 0.012213772162795067
Validation loss = 0.01299811340868473
Validation loss = 0.011180956847965717
Validation loss = 0.010468886233866215
Validation loss = 0.013112654909491539
Validation loss = 0.011275723576545715
Validation loss = 0.010617201216518879
Validation loss = 0.010833445005118847
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 19
average number of affinization = 3.953488372093023
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 2
average number of affinization = 3.909090909090909
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 28
average number of affinization = 4.444444444444445
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 17
average number of affinization = 4.717391304347826
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 28
average number of affinization = 5.212765957446808
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 25
average number of affinization = 5.625
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 297      |
| Iteration     | 6        |
| MaximumReturn | 309      |
| MinimumReturn | 292      |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011303046718239784
Validation loss = 0.010963302105665207
Validation loss = 0.010383402928709984
Validation loss = 0.010671721771359444
Validation loss = 0.01220187172293663
Validation loss = 0.010707365348935127
Validation loss = 0.010928384028375149
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01075931079685688
Validation loss = 0.010572100058197975
Validation loss = 0.011779684573411942
Validation loss = 0.010363467037677765
Validation loss = 0.010030783712863922
Validation loss = 0.011615226045250893
Validation loss = 0.01080986950546503
Validation loss = 0.011002634651958942
Validation loss = 0.010314777493476868
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011229187250137329
Validation loss = 0.011272577568888664
Validation loss = 0.011110908351838589
Validation loss = 0.011530643329024315
Validation loss = 0.011742794886231422
Validation loss = 0.011763140559196472
Validation loss = 0.011746011674404144
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011089508421719074
Validation loss = 0.010800741612911224
Validation loss = 0.011235072277486324
Validation loss = 0.011887995526194572
Validation loss = 0.011493545025587082
Validation loss = 0.012481436133384705
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01166167575865984
Validation loss = 0.0111730070784688
Validation loss = 0.011005551554262638
Validation loss = 0.010721815750002861
Validation loss = 0.011010460555553436
Validation loss = 0.011639936827123165
Validation loss = 0.011849302798509598
Validation loss = 0.011235116980969906
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 9
average number of affinization = 5.6938775510204085
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 5
average number of affinization = 5.68
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 26
average number of affinization = 6.078431372549019
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 35
average number of affinization = 6.634615384615385
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 21
average number of affinization = 6.90566037735849
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 11
average number of affinization = 6.981481481481482
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 296      |
| Iteration     | 7        |
| MaximumReturn | 300      |
| MinimumReturn | 292      |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009870150126516819
Validation loss = 0.012373607605695724
Validation loss = 0.009915510192513466
Validation loss = 0.009706062264740467
Validation loss = 0.00989388395100832
Validation loss = 0.009866618551313877
Validation loss = 0.010097917169332504
Validation loss = 0.011116810142993927
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011315660551190376
Validation loss = 0.011893290095031261
Validation loss = 0.01056637056171894
Validation loss = 0.01192698534578085
Validation loss = 0.009511969052255154
Validation loss = 0.01020289771258831
Validation loss = 0.009869243949651718
Validation loss = 0.009818670339882374
Validation loss = 0.010019849054515362
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011008599773049355
Validation loss = 0.010388007387518883
Validation loss = 0.009841429069638252
Validation loss = 0.01037958450615406
Validation loss = 0.010459961369633675
Validation loss = 0.009705260396003723
Validation loss = 0.011063829995691776
Validation loss = 0.010324199683964252
Validation loss = 0.009727307595312595
Validation loss = 0.010010571219027042
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011142972856760025
Validation loss = 0.009693644940853119
Validation loss = 0.01060792151838541
Validation loss = 0.010771078057587147
Validation loss = 0.010423856787383556
Validation loss = 0.010847080498933792
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012984734028577805
Validation loss = 0.010128701105713844
Validation loss = 0.009761957451701164
Validation loss = 0.011109642684459686
Validation loss = 0.01171765848994255
Validation loss = 0.00970362313091755
Validation loss = 0.010793526656925678
Validation loss = 0.009494730271399021
Validation loss = 0.010668192058801651
Validation loss = 0.010610421188175678
Validation loss = 0.011986767873167992
Validation loss = 0.010700574144721031
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 30
average number of affinization = 7.4
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 15
average number of affinization = 7.535714285714286
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 33
average number of affinization = 7.982456140350878
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 5
average number of affinization = 7.931034482758621
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 2
average number of affinization = 7.830508474576271
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 23
average number of affinization = 8.083333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 301      |
| Iteration     | 8        |
| MaximumReturn | 308      |
| MinimumReturn | 287      |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009535187855362892
Validation loss = 0.008925611153244972
Validation loss = 0.011052509769797325
Validation loss = 0.008447145111858845
Validation loss = 0.00931038148701191
Validation loss = 0.009402642026543617
Validation loss = 0.00901930034160614
Validation loss = 0.009124495089054108
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00929160974919796
Validation loss = 0.009058537892997265
Validation loss = 0.009488971903920174
Validation loss = 0.00910555012524128
Validation loss = 0.009788816794753075
Validation loss = 0.009410847909748554
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009309717454016209
Validation loss = 0.010295875370502472
Validation loss = 0.009993206709623337
Validation loss = 0.008757263422012329
Validation loss = 0.01140983123332262
Validation loss = 0.00921750720590353
Validation loss = 0.009236836805939674
Validation loss = 0.009489627555012703
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009156197309494019
Validation loss = 0.009737466461956501
Validation loss = 0.009541023522615433
Validation loss = 0.011542778462171555
Validation loss = 0.010016129352152348
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009796991012990475
Validation loss = 0.010432038456201553
Validation loss = 0.009441522881388664
Validation loss = 0.008930125273764133
Validation loss = 0.009362896904349327
Validation loss = 0.009172351099550724
Validation loss = 0.010416263714432716
Validation loss = 0.009152112528681755
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 51
average number of affinization = 8.78688524590164
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 51
average number of affinization = 9.46774193548387
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 3
average number of affinization = 9.365079365079366
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 21
average number of affinization = 9.546875
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 5
average number of affinization = 9.476923076923077
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 29
average number of affinization = 9.772727272727273
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 311      |
| Iteration     | 9        |
| MaximumReturn | 319      |
| MinimumReturn | 304      |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009169741533696651
Validation loss = 0.008981455117464066
Validation loss = 0.008574926294386387
Validation loss = 0.008566186763346195
Validation loss = 0.008561772294342518
Validation loss = 0.008928166702389717
Validation loss = 0.008612164296209812
Validation loss = 0.008326748386025429
Validation loss = 0.008736442774534225
Validation loss = 0.008411478251218796
Validation loss = 0.009253892116248608
Validation loss = 0.009149046614766121
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009718380868434906
Validation loss = 0.009041870944201946
Validation loss = 0.009080777876079082
Validation loss = 0.010159331373870373
Validation loss = 0.0089148860424757
Validation loss = 0.00821705348789692
Validation loss = 0.010227843187749386
Validation loss = 0.009062986820936203
Validation loss = 0.008322888985276222
Validation loss = 0.008891355246305466
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00932459905743599
Validation loss = 0.010453379712998867
Validation loss = 0.008580598048865795
Validation loss = 0.008299531415104866
Validation loss = 0.010367762297391891
Validation loss = 0.009371189400553703
Validation loss = 0.009637942537665367
Validation loss = 0.008698204532265663
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010259868577122688
Validation loss = 0.009792349301278591
Validation loss = 0.008630172349512577
Validation loss = 0.008681333623826504
Validation loss = 0.009208385832607746
Validation loss = 0.009552350267767906
Validation loss = 0.009524939581751823
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01015601959079504
Validation loss = 0.008642809465527534
Validation loss = 0.009518013335764408
Validation loss = 0.008538392372429371
Validation loss = 0.009174088947474957
Validation loss = 0.011442761868238449
Validation loss = 0.008887987583875656
Validation loss = 0.009524383582174778
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 9
average number of affinization = 9.761194029850746
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 44
average number of affinization = 10.264705882352942
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 9
average number of affinization = 10.246376811594203
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 43
average number of affinization = 10.714285714285714
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 11
average number of affinization = 10.71830985915493
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 12
average number of affinization = 10.73611111111111
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 312      |
| Iteration     | 10       |
| MaximumReturn | 326      |
| MinimumReturn | 304      |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008109643124043941
Validation loss = 0.008416838012635708
Validation loss = 0.0077350507490336895
Validation loss = 0.008681670762598515
Validation loss = 0.00817608181387186
Validation loss = 0.00858274009078741
Validation loss = 0.00791451521217823
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008143826387822628
Validation loss = 0.008361891843378544
Validation loss = 0.00777854211628437
Validation loss = 0.008468805812299252
Validation loss = 0.007576700299978256
Validation loss = 0.008022001944482327
Validation loss = 0.00844168197363615
Validation loss = 0.00787509884685278
Validation loss = 0.008709157817065716
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007997331209480762
Validation loss = 0.008599973283708096
Validation loss = 0.007856030017137527
Validation loss = 0.008689516223967075
Validation loss = 0.008997483178973198
Validation loss = 0.009341455064713955
Validation loss = 0.007491533178836107
Validation loss = 0.007994483225047588
Validation loss = 0.00865562167018652
Validation loss = 0.007716407999396324
Validation loss = 0.008837173692882061
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008062683045864105
Validation loss = 0.008321617729961872
Validation loss = 0.00791703537106514
Validation loss = 0.00889277458190918
Validation loss = 0.009307987056672573
Validation loss = 0.008447379805147648
Validation loss = 0.008967991918325424
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00918332114815712
Validation loss = 0.009750708006322384
Validation loss = 0.008300001733005047
Validation loss = 0.008511988446116447
Validation loss = 0.009377622045576572
Validation loss = 0.008582580834627151
Validation loss = 0.008180211298167706
Validation loss = 0.008699643425643444
Validation loss = 0.008935137651860714
Validation loss = 0.008171644061803818
Validation loss = 0.008610214106738567
Validation loss = 0.00974854826927185
Validation loss = 0.007670886814594269
Validation loss = 0.008642696775496006
Validation loss = 0.008766734972596169
Validation loss = 0.00841710064560175
Validation loss = 0.00901897344738245
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 33
average number of affinization = 11.04109589041096
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 49
average number of affinization = 11.554054054054054
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 48
average number of affinization = 12.04
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 19
average number of affinization = 12.131578947368421
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 29
average number of affinization = 12.35064935064935
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 77
average number of affinization = 13.179487179487179
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 310      |
| Iteration     | 11       |
| MaximumReturn | 314      |
| MinimumReturn | 307      |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007533062715083361
Validation loss = 0.008340702392160892
Validation loss = 0.008364895358681679
Validation loss = 0.007921763695776463
Validation loss = 0.008207859471440315
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008163979277014732
Validation loss = 0.008952251635491848
Validation loss = 0.007493006996810436
Validation loss = 0.008340241387486458
Validation loss = 0.0074069928377866745
Validation loss = 0.00794134009629488
Validation loss = 0.0076408060267567635
Validation loss = 0.008356872946023941
Validation loss = 0.008075522258877754
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007919084280729294
Validation loss = 0.008913048543035984
Validation loss = 0.007391185034066439
Validation loss = 0.007461564615368843
Validation loss = 0.008163150399923325
Validation loss = 0.007710176985710859
Validation loss = 0.0074867676012218
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008019872941076756
Validation loss = 0.008284483104944229
Validation loss = 0.008249160833656788
Validation loss = 0.008820876479148865
Validation loss = 0.007963459938764572
Validation loss = 0.009143086150288582
Validation loss = 0.008043230511248112
Validation loss = 0.008220404386520386
Validation loss = 0.00871635414659977
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007808849681168795
Validation loss = 0.00835704430937767
Validation loss = 0.008765055797994137
Validation loss = 0.008619149215519428
Validation loss = 0.008087782189249992
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 75
average number of affinization = 13.962025316455696
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 80
average number of affinization = 14.7875
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 57
average number of affinization = 15.308641975308642
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 64
average number of affinization = 15.902439024390244
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 30
average number of affinization = 16.072289156626507
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 12
average number of affinization = 16.023809523809526
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 320      |
| Iteration     | 12       |
| MaximumReturn | 333      |
| MinimumReturn | 308      |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008080415427684784
Validation loss = 0.008054648526012897
Validation loss = 0.008652698248624802
Validation loss = 0.008425646461546421
Validation loss = 0.008084110915660858
Validation loss = 0.008027900941669941
Validation loss = 0.00808883085846901
Validation loss = 0.008138732053339481
Validation loss = 0.008145281113684177
Validation loss = 0.007608743850141764
Validation loss = 0.007492815610021353
Validation loss = 0.007586084771901369
Validation loss = 0.007901985198259354
Validation loss = 0.007439236156642437
Validation loss = 0.0076365540735423565
Validation loss = 0.007303385529667139
Validation loss = 0.00761067820712924
Validation loss = 0.008181121200323105
Validation loss = 0.008357347920536995
Validation loss = 0.008022801950573921
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008598350919783115
Validation loss = 0.007324838545173407
Validation loss = 0.008158869110047817
Validation loss = 0.008233009837567806
Validation loss = 0.007625380996614695
Validation loss = 0.008015923202037811
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00812817644327879
Validation loss = 0.008405020460486412
Validation loss = 0.008100445382297039
Validation loss = 0.0075248414650559425
Validation loss = 0.007503730710595846
Validation loss = 0.008371083997189999
Validation loss = 0.009046666324138641
Validation loss = 0.007321611978113651
Validation loss = 0.0074178436771035194
Validation loss = 0.007870763540267944
Validation loss = 0.007573433220386505
Validation loss = 0.0074233911000192165
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007821200415492058
Validation loss = 0.007899745367467403
Validation loss = 0.009362728334963322
Validation loss = 0.007663525175303221
Validation loss = 0.00819616112858057
Validation loss = 0.00822822842746973
Validation loss = 0.00852732639759779
Validation loss = 0.00823535118252039
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009616309776902199
Validation loss = 0.007902024313807487
Validation loss = 0.00832060445100069
Validation loss = 0.009394529275596142
Validation loss = 0.008616085164248943
Validation loss = 0.008140449412167072
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 90
average number of affinization = 16.894117647058824
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 75
average number of affinization = 17.569767441860463
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 88
average number of affinization = 18.379310344827587
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 107
average number of affinization = 19.386363636363637
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 49
average number of affinization = 19.719101123595507
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 63
average number of affinization = 20.2
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 324      |
| Iteration     | 13       |
| MaximumReturn | 328      |
| MinimumReturn | 319      |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007533751428127289
Validation loss = 0.007577447686344385
Validation loss = 0.007305339444428682
Validation loss = 0.007299608085304499
Validation loss = 0.007245803251862526
Validation loss = 0.007656491827219725
Validation loss = 0.00779563095420599
Validation loss = 0.007670693099498749
Validation loss = 0.007993333972990513
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007139908149838448
Validation loss = 0.006934111472219229
Validation loss = 0.006973392330110073
Validation loss = 0.007946861907839775
Validation loss = 0.007172095123678446
Validation loss = 0.007618033327162266
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007972794584929943
Validation loss = 0.007954206317663193
Validation loss = 0.007015349343419075
Validation loss = 0.008320653811097145
Validation loss = 0.0070760915987193584
Validation loss = 0.007331832777708769
Validation loss = 0.007842697203159332
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007516541983932257
Validation loss = 0.008026144467294216
Validation loss = 0.007986358366906643
Validation loss = 0.007622288074344397
Validation loss = 0.007636557798832655
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00822952389717102
Validation loss = 0.007336492650210857
Validation loss = 0.007933494634926319
Validation loss = 0.008230826817452908
Validation loss = 0.0075205350294709206
Validation loss = 0.008034557104110718
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 8
average number of affinization = 20.065934065934066
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 26
average number of affinization = 20.130434782608695
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 51
average number of affinization = 20.462365591397848
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 80
average number of affinization = 21.095744680851062
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 57
average number of affinization = 21.473684210526315
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 31
average number of affinization = 21.572916666666668
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 317      |
| Iteration     | 14       |
| MaximumReturn | 323      |
| MinimumReturn | 312      |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007866373285651207
Validation loss = 0.006849743891507387
Validation loss = 0.006986888591200113
Validation loss = 0.006967632099986076
Validation loss = 0.007474998477846384
Validation loss = 0.007399836555123329
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007126105949282646
Validation loss = 0.007789676543325186
Validation loss = 0.006939897313714027
Validation loss = 0.007423881907016039
Validation loss = 0.00745999813079834
Validation loss = 0.007663405500352383
Validation loss = 0.007159729488193989
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0068242913112044334
Validation loss = 0.007425186224281788
Validation loss = 0.0072693717665970325
Validation loss = 0.006806668359786272
Validation loss = 0.007670781575143337
Validation loss = 0.007153946440666914
Validation loss = 0.007324786391109228
Validation loss = 0.007119768299162388
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007048758678138256
Validation loss = 0.007523027248680592
Validation loss = 0.007716786582022905
Validation loss = 0.008627153933048248
Validation loss = 0.007747583091259003
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007432001642882824
Validation loss = 0.007025560364127159
Validation loss = 0.007217715494334698
Validation loss = 0.0070766881108284
Validation loss = 0.007423505187034607
Validation loss = 0.007457630708813667
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 49
average number of affinization = 21.855670103092784
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 80
average number of affinization = 22.448979591836736
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 47
average number of affinization = 22.696969696969695
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 81
average number of affinization = 23.28
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 41
average number of affinization = 23.455445544554454
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 33
average number of affinization = 23.54901960784314
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 312      |
| Iteration     | 15       |
| MaximumReturn | 315      |
| MinimumReturn | 309      |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007475774269551039
Validation loss = 0.006901290267705917
Validation loss = 0.006639060098677874
Validation loss = 0.006628529634326696
Validation loss = 0.007084702141582966
Validation loss = 0.006670278497040272
Validation loss = 0.0078601548448205
Validation loss = 0.006573891267180443
Validation loss = 0.006807040423154831
Validation loss = 0.0064234063029289246
Validation loss = 0.007285380735993385
Validation loss = 0.006338916253298521
Validation loss = 0.006638048216700554
Validation loss = 0.006874671671539545
Validation loss = 0.006703031249344349
Validation loss = 0.007262973114848137
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006874161306768656
Validation loss = 0.007547423243522644
Validation loss = 0.006538912653923035
Validation loss = 0.006897416897118092
Validation loss = 0.006730889901518822
Validation loss = 0.006895333528518677
Validation loss = 0.0074642677791416645
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007330854423344135
Validation loss = 0.006896008271723986
Validation loss = 0.006839296780526638
Validation loss = 0.006507127080112696
Validation loss = 0.006480358075350523
Validation loss = 0.007045952137559652
Validation loss = 0.0071899425238370895
Validation loss = 0.007051724940538406
Validation loss = 0.006605898030102253
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0075636147521436214
Validation loss = 0.0077372463420033455
Validation loss = 0.007909782230854034
Validation loss = 0.007167895324528217
Validation loss = 0.007269976194947958
Validation loss = 0.006997662130743265
Validation loss = 0.007106993813067675
Validation loss = 0.007415030617266893
Validation loss = 0.007000078447163105
Validation loss = 0.0071350764483213425
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007502834778279066
Validation loss = 0.0069462512619793415
Validation loss = 0.006962387822568417
Validation loss = 0.007111068814992905
Validation loss = 0.00709507567808032
Validation loss = 0.007597531657665968
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 25
average number of affinization = 23.563106796116504
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 51
average number of affinization = 23.826923076923077
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 73
average number of affinization = 24.295238095238094
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 57
average number of affinization = 24.60377358490566
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 81
average number of affinization = 25.130841121495326
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 50
average number of affinization = 25.36111111111111
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 312      |
| Iteration     | 16       |
| MaximumReturn | 320      |
| MinimumReturn | 308      |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006441026460379362
Validation loss = 0.006414767354726791
Validation loss = 0.0071571096777915955
Validation loss = 0.006744901649653912
Validation loss = 0.006834559142589569
Validation loss = 0.006448650266975164
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006553212180733681
Validation loss = 0.006809155456721783
Validation loss = 0.006982536055147648
Validation loss = 0.006769666448235512
Validation loss = 0.0065513234585523605
Validation loss = 0.006463306024670601
Validation loss = 0.007771557196974754
Validation loss = 0.00649088341742754
Validation loss = 0.006384638138115406
Validation loss = 0.006383194588124752
Validation loss = 0.006363864988088608
Validation loss = 0.007077723741531372
Validation loss = 0.007311869878321886
Validation loss = 0.006665839347988367
Validation loss = 0.006593659520149231
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006501596886664629
Validation loss = 0.006935206241905689
Validation loss = 0.006612337194383144
Validation loss = 0.007604249753057957
Validation loss = 0.00633700005710125
Validation loss = 0.006384845823049545
Validation loss = 0.006643680855631828
Validation loss = 0.006427021697163582
Validation loss = 0.0063975173979997635
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007771650794893503
Validation loss = 0.0065597593784332275
Validation loss = 0.006610155571252108
Validation loss = 0.007118197623640299
Validation loss = 0.006367353722453117
Validation loss = 0.007351851090788841
Validation loss = 0.006973791401833296
Validation loss = 0.006375404540449381
Validation loss = 0.0067223752848804
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006621101405471563
Validation loss = 0.0064259180799126625
Validation loss = 0.006827033590525389
Validation loss = 0.006524691358208656
Validation loss = 0.006791578605771065
Validation loss = 0.00794860441237688
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 37
average number of affinization = 25.46788990825688
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 28
average number of affinization = 25.490909090909092
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 36
average number of affinization = 25.585585585585587
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 16
average number of affinization = 25.5
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 4
average number of affinization = 25.309734513274336
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 34
average number of affinization = 25.385964912280702
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 319      |
| Iteration     | 17       |
| MaximumReturn | 321      |
| MinimumReturn | 314      |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0066172401420772076
Validation loss = 0.006387209054082632
Validation loss = 0.0066995942033827305
Validation loss = 0.006104615051299334
Validation loss = 0.006454240530729294
Validation loss = 0.0063139223493635654
Validation loss = 0.00591025035828352
Validation loss = 0.006364729721099138
Validation loss = 0.006080010440200567
Validation loss = 0.005654551554471254
Validation loss = 0.00663809385150671
Validation loss = 0.006139952689409256
Validation loss = 0.006321660242974758
Validation loss = 0.006445056758821011
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0066696167923510075
Validation loss = 0.006614906247705221
Validation loss = 0.006276492029428482
Validation loss = 0.006136108189821243
Validation loss = 0.00627189502120018
Validation loss = 0.006714637391269207
Validation loss = 0.006642293184995651
Validation loss = 0.006931832060217857
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005828533321619034
Validation loss = 0.006460597738623619
Validation loss = 0.006223924458026886
Validation loss = 0.006682414095848799
Validation loss = 0.005813708994537592
Validation loss = 0.006030809134244919
Validation loss = 0.006433041300624609
Validation loss = 0.006447816267609596
Validation loss = 0.00594470277428627
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006237621884793043
Validation loss = 0.006205014884471893
Validation loss = 0.006420195568352938
Validation loss = 0.006663172505795956
Validation loss = 0.006537949200719595
Validation loss = 0.006620096508413553
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006215819623321295
Validation loss = 0.006356123369187117
Validation loss = 0.006196429952979088
Validation loss = 0.006848704535514116
Validation loss = 0.006080632098019123
Validation loss = 0.0065891011618077755
Validation loss = 0.006169280502945185
Validation loss = 0.006929092574864626
Validation loss = 0.007248669862747192
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 69
average number of affinization = 25.765217391304347
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 6
average number of affinization = 25.594827586206897
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 1
average number of affinization = 25.384615384615383
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 19
average number of affinization = 25.33050847457627
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 15
average number of affinization = 25.243697478991596
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 53
average number of affinization = 25.475
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 311      |
| Iteration     | 18       |
| MaximumReturn | 314      |
| MinimumReturn | 308      |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005931322928518057
Validation loss = 0.005913938395678997
Validation loss = 0.006304576061666012
Validation loss = 0.0059471940621733665
Validation loss = 0.0058814422227442265
Validation loss = 0.006072456482797861
Validation loss = 0.005928362254053354
Validation loss = 0.005703476257622242
Validation loss = 0.005637415684759617
Validation loss = 0.006024778354912996
Validation loss = 0.005997923202812672
Validation loss = 0.006282889749854803
Validation loss = 0.006321436259895563
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006362462881952524
Validation loss = 0.0057699112221598625
Validation loss = 0.006483296863734722
Validation loss = 0.005848316475749016
Validation loss = 0.006570379249751568
Validation loss = 0.0055847689509391785
Validation loss = 0.005916462745517492
Validation loss = 0.0064473538659513
Validation loss = 0.0066039906814694405
Validation loss = 0.006311751902103424
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006027345545589924
Validation loss = 0.006205806974321604
Validation loss = 0.005934770219027996
Validation loss = 0.006636203266680241
Validation loss = 0.005915803834795952
Validation loss = 0.006015427876263857
Validation loss = 0.006875431630760431
Validation loss = 0.005667639896273613
Validation loss = 0.006030485033988953
Validation loss = 0.006092363502830267
Validation loss = 0.006848725490272045
Validation loss = 0.005729693453758955
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007184808142483234
Validation loss = 0.006300059147179127
Validation loss = 0.00629679998382926
Validation loss = 0.005958352237939835
Validation loss = 0.00587570620700717
Validation loss = 0.006414517760276794
Validation loss = 0.006622515618801117
Validation loss = 0.00651022931560874
Validation loss = 0.006893952377140522
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005986337549984455
Validation loss = 0.006905931048095226
Validation loss = 0.006365774665027857
Validation loss = 0.006403853185474873
Validation loss = 0.007082787808030844
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 5
average number of affinization = 25.305785123966942
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 102
average number of affinization = 25.934426229508198
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 94
average number of affinization = 26.48780487804878
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 10
average number of affinization = 26.35483870967742
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 57
average number of affinization = 26.6
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 42
average number of affinization = 26.72222222222222
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 311      |
| Iteration     | 19       |
| MaximumReturn | 314      |
| MinimumReturn | 308      |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0058954618871212006
Validation loss = 0.005450899247080088
Validation loss = 0.006113593932241201
Validation loss = 0.005921668838709593
Validation loss = 0.0057066818699240685
Validation loss = 0.0056090448051691055
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005684331059455872
Validation loss = 0.005624310113489628
Validation loss = 0.006025976967066526
Validation loss = 0.006157238036394119
Validation loss = 0.006004386581480503
Validation loss = 0.005939877592027187
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005622117780148983
Validation loss = 0.005908446852117777
Validation loss = 0.005908200051635504
Validation loss = 0.005698833614587784
Validation loss = 0.0064607602544128895
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006380337290465832
Validation loss = 0.006144561804831028
Validation loss = 0.006982216611504555
Validation loss = 0.006133846007287502
Validation loss = 0.00643806392326951
Validation loss = 0.006170086096972227
Validation loss = 0.007414235267788172
Validation loss = 0.006006321869790554
Validation loss = 0.006410553120076656
Validation loss = 0.005910035222768784
Validation loss = 0.006163142155855894
Validation loss = 0.00662100687623024
Validation loss = 0.006163864396512508
Validation loss = 0.006120571866631508
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006305577699095011
Validation loss = 0.006249662488698959
Validation loss = 0.00610527815297246
Validation loss = 0.00623904587700963
Validation loss = 0.006013197358697653
Validation loss = 0.006500736810266972
Validation loss = 0.0059313690289855
Validation loss = 0.00579897640272975
Validation loss = 0.006328035145998001
Validation loss = 0.00662916898727417
Validation loss = 0.005733205936849117
Validation loss = 0.0061798193491995335
Validation loss = 0.006046590860933065
Validation loss = 0.005863424856215715
Validation loss = 0.00593635905534029
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 63
average number of affinization = 27.007874015748033
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 31
average number of affinization = 27.0390625
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 22
average number of affinization = 27.0
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 39
average number of affinization = 27.092307692307692
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 13
average number of affinization = 26.984732824427482
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 41
average number of affinization = 27.09090909090909
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 303      |
| Iteration     | 20       |
| MaximumReturn | 308      |
| MinimumReturn | 296      |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005607392638921738
Validation loss = 0.005602460820227861
Validation loss = 0.005803377833217382
Validation loss = 0.00568802235648036
Validation loss = 0.005570001434534788
Validation loss = 0.005971095059067011
Validation loss = 0.005804036743938923
Validation loss = 0.005382523871958256
Validation loss = 0.005712288431823254
Validation loss = 0.005963144823908806
Validation loss = 0.005568158347159624
Validation loss = 0.005943831522017717
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005752979312092066
Validation loss = 0.005863971542567015
Validation loss = 0.0055725183337926865
Validation loss = 0.00628444692119956
Validation loss = 0.005798925645649433
Validation loss = 0.005639920011162758
Validation loss = 0.0057785348035395145
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006768005900084972
Validation loss = 0.0056112478487193584
Validation loss = 0.0054768286645412445
Validation loss = 0.005672731902450323
Validation loss = 0.005989548750221729
Validation loss = 0.005489906296133995
Validation loss = 0.005642575677484274
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005838325712829828
Validation loss = 0.006673065014183521
Validation loss = 0.005594342481344938
Validation loss = 0.005726952571421862
Validation loss = 0.005703981500118971
Validation loss = 0.0065086293034255505
Validation loss = 0.00607071490958333
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006193816661834717
Validation loss = 0.006014414597302675
Validation loss = 0.0060130017809569836
Validation loss = 0.005764820147305727
Validation loss = 0.006346588954329491
Validation loss = 0.005608848761767149
Validation loss = 0.0062690493650734425
Validation loss = 0.006043149624019861
Validation loss = 0.005828730296343565
Validation loss = 0.006106259301304817
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 53
average number of affinization = 27.285714285714285
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 29
average number of affinization = 27.29850746268657
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 38
average number of affinization = 27.377777777777776
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 1
average number of affinization = 27.183823529411764
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 27
average number of affinization = 27.182481751824817
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 29
average number of affinization = 27.195652173913043
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 300      |
| Iteration     | 21       |
| MaximumReturn | 301      |
| MinimumReturn | 299      |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006157806608825922
Validation loss = 0.005638445727527142
Validation loss = 0.005502790678292513
Validation loss = 0.006036895792931318
Validation loss = 0.005273349117487669
Validation loss = 0.005238906014710665
Validation loss = 0.005569476634263992
Validation loss = 0.005682989954948425
Validation loss = 0.005565798841416836
Validation loss = 0.0061789038591086864
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005535000469535589
Validation loss = 0.005268605425953865
Validation loss = 0.005543608218431473
Validation loss = 0.00552095752209425
Validation loss = 0.005708809942007065
Validation loss = 0.005351801868528128
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005633041262626648
Validation loss = 0.005519728176295757
Validation loss = 0.005547498352825642
Validation loss = 0.00550887593999505
Validation loss = 0.0057922848500311375
Validation loss = 0.005831512622535229
Validation loss = 0.006642617750912905
Validation loss = 0.005192310083657503
Validation loss = 0.005862164311110973
Validation loss = 0.005566883832216263
Validation loss = 0.005601090379059315
Validation loss = 0.005348384380340576
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005677451379597187
Validation loss = 0.005724712740629911
Validation loss = 0.005522817373275757
Validation loss = 0.0055415029637515545
Validation loss = 0.0055833784863352776
Validation loss = 0.006216533947736025
Validation loss = 0.005698276683688164
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0053637949749827385
Validation loss = 0.006070189643651247
Validation loss = 0.005584610626101494
Validation loss = 0.006214999593794346
Validation loss = 0.0054535712115466595
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 49
average number of affinization = 27.35251798561151
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 82
average number of affinization = 27.742857142857144
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 33
average number of affinization = 27.78014184397163
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 51
average number of affinization = 27.943661971830984
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 7
average number of affinization = 27.797202797202797
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 48
average number of affinization = 27.9375
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 306      |
| Iteration     | 22       |
| MaximumReturn | 308      |
| MinimumReturn | 304      |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0052210078574717045
Validation loss = 0.005225473549216986
Validation loss = 0.005632504820823669
Validation loss = 0.00582824507728219
Validation loss = 0.005085159558802843
Validation loss = 0.0054595875553786755
Validation loss = 0.005565551575273275
Validation loss = 0.005584710743278265
Validation loss = 0.00557168060913682
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005677482113242149
Validation loss = 0.005318926181644201
Validation loss = 0.005409501492977142
Validation loss = 0.005382912699133158
Validation loss = 0.0057598804123699665
Validation loss = 0.0053243488073349
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00503079267218709
Validation loss = 0.005475447978824377
Validation loss = 0.005889432039111853
Validation loss = 0.005287672858685255
Validation loss = 0.005552841350436211
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005692502949386835
Validation loss = 0.0055295987986028194
Validation loss = 0.0054688709788024426
Validation loss = 0.0052939471788704395
Validation loss = 0.005530367139726877
Validation loss = 0.005322964396327734
Validation loss = 0.0052285934798419476
Validation loss = 0.005745749920606613
Validation loss = 0.005432782229036093
Validation loss = 0.0055695767514407635
Validation loss = 0.005493199918419123
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005126262549310923
Validation loss = 0.005505658686161041
Validation loss = 0.005791158881038427
Validation loss = 0.00540241738781333
Validation loss = 0.005545355379581451
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 88
average number of affinization = 28.351724137931033
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 21
average number of affinization = 28.301369863013697
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 64
average number of affinization = 28.54421768707483
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 53
average number of affinization = 28.70945945945946
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 75
average number of affinization = 29.02013422818792
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 89
average number of affinization = 29.42
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 302      |
| Iteration     | 23       |
| MaximumReturn | 307      |
| MinimumReturn | 299      |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005672661121934652
Validation loss = 0.005381770897656679
Validation loss = 0.005386305972933769
Validation loss = 0.005941041745245457
Validation loss = 0.005006455350667238
Validation loss = 0.00511856097728014
Validation loss = 0.005353787448257208
Validation loss = 0.005656654946506023
Validation loss = 0.005134644452482462
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0054552615620195866
Validation loss = 0.0054002804681658745
Validation loss = 0.005215151235461235
Validation loss = 0.005984444636851549
Validation loss = 0.005187900271266699
Validation loss = 0.00522662652656436
Validation loss = 0.0053458018228411674
Validation loss = 0.005384010262787342
Validation loss = 0.005151471123099327
Validation loss = 0.005843964871019125
Validation loss = 0.005213512107729912
Validation loss = 0.005368018057197332
Validation loss = 0.005427510943263769
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005936311092227697
Validation loss = 0.00499387364834547
Validation loss = 0.005134043283760548
Validation loss = 0.005214357282966375
Validation loss = 0.005430207122117281
Validation loss = 0.005600782111287117
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005752958822995424
Validation loss = 0.0051491837948560715
Validation loss = 0.0058587100356817245
Validation loss = 0.0053575546480715275
Validation loss = 0.005660390481352806
Validation loss = 0.005338843446224928
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005183264147490263
Validation loss = 0.005429564509540796
Validation loss = 0.0056585692800581455
Validation loss = 0.0052132136188447475
Validation loss = 0.005366423632949591
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 35
average number of affinization = 29.456953642384107
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 20
average number of affinization = 29.394736842105264
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 16
average number of affinization = 29.30718954248366
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 63
average number of affinization = 29.525974025974026
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 37
average number of affinization = 29.574193548387097
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 16
average number of affinization = 29.487179487179485
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 303      |
| Iteration     | 24       |
| MaximumReturn | 307      |
| MinimumReturn | 299      |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005148885305970907
Validation loss = 0.00537197245284915
Validation loss = 0.00572913559153676
Validation loss = 0.005066216457635164
Validation loss = 0.005439015571027994
Validation loss = 0.005213923752307892
Validation loss = 0.005150364711880684
Validation loss = 0.005252002738416195
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005509861744940281
Validation loss = 0.005281863734126091
Validation loss = 0.005078649148344994
Validation loss = 0.005137951113283634
Validation loss = 0.005159969441592693
Validation loss = 0.005044908262789249
Validation loss = 0.005255240015685558
Validation loss = 0.005600679200142622
Validation loss = 0.005828037858009338
Validation loss = 0.004935278557240963
Validation loss = 0.0052729700691998005
Validation loss = 0.0051413425244390965
Validation loss = 0.006610033102333546
Validation loss = 0.005325625650584698
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005177532322704792
Validation loss = 0.005269044078886509
Validation loss = 0.005593322217464447
Validation loss = 0.005050798412412405
Validation loss = 0.005209826398640871
Validation loss = 0.005358670838177204
Validation loss = 0.005312238819897175
Validation loss = 0.005529641639441252
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005024356301873922
Validation loss = 0.0054938714019954205
Validation loss = 0.005085520446300507
Validation loss = 0.005449855700135231
Validation loss = 0.0052924808114767075
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005551583133637905
Validation loss = 0.005162094719707966
Validation loss = 0.005197205580770969
Validation loss = 0.005588144529610872
Validation loss = 0.005146836396306753
Validation loss = 0.004993884824216366
Validation loss = 0.005091056227684021
Validation loss = 0.005328672472387552
Validation loss = 0.005110547412186861
Validation loss = 0.005120389629155397
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 35
average number of affinization = 29.522292993630572
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 59
average number of affinization = 29.70886075949367
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 80
average number of affinization = 30.0251572327044
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 97
average number of affinization = 30.44375
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 56
average number of affinization = 30.60248447204969
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 39
average number of affinization = 30.65432098765432
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 309      |
| Iteration     | 25       |
| MaximumReturn | 317      |
| MinimumReturn | 301      |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005203874781727791
Validation loss = 0.0052279941737651825
Validation loss = 0.005122725386172533
Validation loss = 0.005273135378956795
Validation loss = 0.005065444391220808
Validation loss = 0.005033775698393583
Validation loss = 0.005069836042821407
Validation loss = 0.004919841885566711
Validation loss = 0.004835412371903658
Validation loss = 0.004948896821588278
Validation loss = 0.005060179159045219
Validation loss = 0.005022325552999973
Validation loss = 0.0050539192743599415
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005070037674158812
Validation loss = 0.005164381582289934
Validation loss = 0.005518692079931498
Validation loss = 0.004948033951222897
Validation loss = 0.004816800355911255
Validation loss = 0.005123207811266184
Validation loss = 0.005282738246023655
Validation loss = 0.004803117364645004
Validation loss = 0.00477924570441246
Validation loss = 0.00534402672201395
Validation loss = 0.005270384717732668
Validation loss = 0.005309704691171646
Validation loss = 0.005075458902865648
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005881546996533871
Validation loss = 0.005157084204256535
Validation loss = 0.0049233343452215195
Validation loss = 0.005177176091820002
Validation loss = 0.005114582367241383
Validation loss = 0.005254330579191446
Validation loss = 0.005143309477716684
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005502715241163969
Validation loss = 0.005167779047042131
Validation loss = 0.005261140409857035
Validation loss = 0.005362249445170164
Validation loss = 0.005632961168885231
Validation loss = 0.005197233986109495
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004911248106509447
Validation loss = 0.005233531352132559
Validation loss = 0.00498725101351738
Validation loss = 0.0051188161596655846
Validation loss = 0.0055253528989851475
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 24
average number of affinization = 30.613496932515336
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 36
average number of affinization = 30.646341463414632
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 39
average number of affinization = 30.696969696969695
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 55
average number of affinization = 30.843373493975903
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 9
average number of affinization = 30.7125748502994
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 26
average number of affinization = 30.68452380952381
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 306      |
| Iteration     | 26       |
| MaximumReturn | 309      |
| MinimumReturn | 304      |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005046722013503313
Validation loss = 0.0052602579817175865
Validation loss = 0.004771155305206776
Validation loss = 0.004795590881258249
Validation loss = 0.005128247197717428
Validation loss = 0.004853040911257267
Validation loss = 0.005065214354544878
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005257291253656149
Validation loss = 0.00487527484074235
Validation loss = 0.0049560279585421085
Validation loss = 0.005017400719225407
Validation loss = 0.005005129147320986
Validation loss = 0.005323607474565506
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004963950254023075
Validation loss = 0.004880774766206741
Validation loss = 0.005078812595456839
Validation loss = 0.004905850626528263
Validation loss = 0.004942193161696196
Validation loss = 0.005435032304376364
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005026104394346476
Validation loss = 0.005144305527210236
Validation loss = 0.005183756351470947
Validation loss = 0.005736024118959904
Validation loss = 0.005705215036869049
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005235997494310141
Validation loss = 0.004921611864119768
Validation loss = 0.005046393256634474
Validation loss = 0.004890298005193472
Validation loss = 0.005015391856431961
Validation loss = 0.004805603064596653
Validation loss = 0.004897467326372862
Validation loss = 0.005084887146949768
Validation loss = 0.0049512796103954315
Validation loss = 0.005299302749335766
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 69
average number of affinization = 30.911242603550296
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 70
average number of affinization = 31.141176470588235
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 56
average number of affinization = 31.28654970760234
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 87
average number of affinization = 31.61046511627907
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 45
average number of affinization = 31.6878612716763
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 53
average number of affinization = 31.810344827586206
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 310      |
| Iteration     | 27       |
| MaximumReturn | 313      |
| MinimumReturn | 306      |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0048970491625368595
Validation loss = 0.004980394151061773
Validation loss = 0.0047836690209805965
Validation loss = 0.005005518905818462
Validation loss = 0.004886955488473177
Validation loss = 0.0048326971009373665
Validation loss = 0.0052681961096823215
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005255484953522682
Validation loss = 0.004949151538312435
Validation loss = 0.004886147566139698
Validation loss = 0.0050145299173891544
Validation loss = 0.005065992474555969
Validation loss = 0.0050548380240798
Validation loss = 0.004751087166368961
Validation loss = 0.005201408639550209
Validation loss = 0.005151449237018824
Validation loss = 0.004988452885299921
Validation loss = 0.0050426265224814415
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0054820082150399685
Validation loss = 0.004648684989660978
Validation loss = 0.004994410090148449
Validation loss = 0.004947755020111799
Validation loss = 0.004910438321530819
Validation loss = 0.004939157050102949
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005297054070979357
Validation loss = 0.005503217689692974
Validation loss = 0.005359944421797991
Validation loss = 0.004851951729506254
Validation loss = 0.005430297460407019
Validation loss = 0.0050046686083078384
Validation loss = 0.005325575824826956
Validation loss = 0.005215186160057783
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004692741669714451
Validation loss = 0.004927716683596373
Validation loss = 0.005497893784195185
Validation loss = 0.005327979568392038
Validation loss = 0.004760324954986572
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 72
average number of affinization = 32.04
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 50
average number of affinization = 32.14204545454545
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 58
average number of affinization = 32.28813559322034
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 62
average number of affinization = 32.45505617977528
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 45
average number of affinization = 32.52513966480447
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 61
average number of affinization = 32.68333333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 306      |
| Iteration     | 28       |
| MaximumReturn | 307      |
| MinimumReturn | 304      |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005101541522890329
Validation loss = 0.004930504132062197
Validation loss = 0.004878625273704529
Validation loss = 0.004794995300471783
Validation loss = 0.005532868672162294
Validation loss = 0.0047781700268387794
Validation loss = 0.005437523126602173
Validation loss = 0.00511365057900548
Validation loss = 0.0048048351891338825
Validation loss = 0.005194620229303837
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005108245648443699
Validation loss = 0.005152556113898754
Validation loss = 0.004938864149153233
Validation loss = 0.004938200581818819
Validation loss = 0.005118182394653559
Validation loss = 0.005231828894466162
Validation loss = 0.004997107200324535
Validation loss = 0.004887056071311235
Validation loss = 0.005060089286416769
Validation loss = 0.005373102147132158
Validation loss = 0.004984143190085888
Validation loss = 0.005448464769870043
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0049785408191382885
Validation loss = 0.004704623483121395
Validation loss = 0.004981684032827616
Validation loss = 0.005498245824128389
Validation loss = 0.00490572489798069
Validation loss = 0.0047240410931408405
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004958006553351879
Validation loss = 0.0052011157386004925
Validation loss = 0.004988552536815405
Validation loss = 0.00483324471861124
Validation loss = 0.00506936339661479
Validation loss = 0.004868601448833942
Validation loss = 0.005364398937672377
Validation loss = 0.00491859158501029
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005423990078270435
Validation loss = 0.0049012876115739346
Validation loss = 0.004993791226297617
Validation loss = 0.00545286200940609
Validation loss = 0.005057413596659899
Validation loss = 0.00493666622787714
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 43
average number of affinization = 32.74033149171271
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 49
average number of affinization = 32.82967032967033
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 78
average number of affinization = 33.076502732240435
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 67
average number of affinization = 33.26086956521739
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 88
average number of affinization = 33.556756756756755
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 81
average number of affinization = 33.81182795698925
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 310      |
| Iteration     | 29       |
| MaximumReturn | 315      |
| MinimumReturn | 306      |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005082225427031517
Validation loss = 0.005063931457698345
Validation loss = 0.005061466712504625
Validation loss = 0.005392420571297407
Validation loss = 0.0050891367718577385
Validation loss = 0.005277902353554964
Validation loss = 0.005409849341958761
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004966682754456997
Validation loss = 0.005133267026394606
Validation loss = 0.005237503442913294
Validation loss = 0.005385961849242449
Validation loss = 0.004998200573027134
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005094091407954693
Validation loss = 0.005449278280138969
Validation loss = 0.005116185639053583
Validation loss = 0.005128365475684404
Validation loss = 0.005251761060208082
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005605326499789953
Validation loss = 0.005372235085815191
Validation loss = 0.004844621755182743
Validation loss = 0.005110406782478094
Validation loss = 0.0062695895321667194
Validation loss = 0.00538395531475544
Validation loss = 0.005210236646234989
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005533986259251833
Validation loss = 0.00539246154949069
Validation loss = 0.005220769438892603
Validation loss = 0.00528451893478632
Validation loss = 0.0054149022325873375
Validation loss = 0.005272436421364546
Validation loss = 0.004842609167098999
Validation loss = 0.005183840170502663
Validation loss = 0.004949770402163267
Validation loss = 0.005051659420132637
Validation loss = 0.005559595301747322
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 28
average number of affinization = 33.780748663101605
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 75
average number of affinization = 34.0
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 75
average number of affinization = 34.216931216931215
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 69
average number of affinization = 34.4
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 35
average number of affinization = 34.403141361256544
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 65
average number of affinization = 34.5625
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 307      |
| Iteration     | 30       |
| MaximumReturn | 317      |
| MinimumReturn | 299      |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005325488280504942
Validation loss = 0.005458281375467777
Validation loss = 0.004988404456526041
Validation loss = 0.005290032364428043
Validation loss = 0.005000518634915352
Validation loss = 0.005225886590778828
Validation loss = 0.005086779594421387
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005251120775938034
Validation loss = 0.0050241658464074135
Validation loss = 0.0050788819789886475
Validation loss = 0.00521476287394762
Validation loss = 0.005051055923104286
Validation loss = 0.005031912587583065
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005578698590397835
Validation loss = 0.005749848671257496
Validation loss = 0.005093717947602272
Validation loss = 0.005175640806555748
Validation loss = 0.0054315365850925446
Validation loss = 0.00525520509108901
Validation loss = 0.005321442149579525
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00516249006614089
Validation loss = 0.005227347835898399
Validation loss = 0.005328981205821037
Validation loss = 0.0055417027324438095
Validation loss = 0.005110529717057943
Validation loss = 0.005342605523765087
Validation loss = 0.005113128572702408
Validation loss = 0.005350584164261818
Validation loss = 0.005069222301244736
Validation loss = 0.0052681006491184235
Validation loss = 0.005146441049873829
Validation loss = 0.004899307619780302
Validation loss = 0.005301156546920538
Validation loss = 0.005242157727479935
Validation loss = 0.0051384130492806435
Validation loss = 0.005563370417803526
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005653942469507456
Validation loss = 0.005342608317732811
Validation loss = 0.005126915872097015
Validation loss = 0.005200101062655449
Validation loss = 0.005733535625040531
Validation loss = 0.005265362095087767
Validation loss = 0.005407027900218964
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 69
average number of affinization = 34.740932642487046
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 72
average number of affinization = 34.93298969072165
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 85
average number of affinization = 35.18974358974359
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 60
average number of affinization = 35.316326530612244
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 46
average number of affinization = 35.370558375634516
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 83
average number of affinization = 35.611111111111114
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 311      |
| Iteration     | 31       |
| MaximumReturn | 318      |
| MinimumReturn | 306      |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0049332790076732635
Validation loss = 0.005290334112942219
Validation loss = 0.005280279088765383
Validation loss = 0.005038228817284107
Validation loss = 0.005360801238566637
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0052039665170013905
Validation loss = 0.005171327386051416
Validation loss = 0.005245845764875412
Validation loss = 0.005287421867251396
Validation loss = 0.005240475293248892
Validation loss = 0.0053611514158546925
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005202891770750284
Validation loss = 0.005148374941200018
Validation loss = 0.005025823134928942
Validation loss = 0.005836602300405502
Validation loss = 0.005539430771023035
Validation loss = 0.0050523728132247925
Validation loss = 0.00499065313488245
Validation loss = 0.005208977032452822
Validation loss = 0.0050279609858989716
Validation loss = 0.005245820619165897
Validation loss = 0.005361317191272974
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005265612620860338
Validation loss = 0.005287279840558767
Validation loss = 0.005158938467502594
Validation loss = 0.005286634434014559
Validation loss = 0.005574803799390793
Validation loss = 0.005082025658339262
Validation loss = 0.0051472424529492855
Validation loss = 0.005784628447145224
Validation loss = 0.005189054645597935
Validation loss = 0.004998879041522741
Validation loss = 0.0051538534462451935
Validation loss = 0.005312893074005842
Validation loss = 0.005105747375637293
Validation loss = 0.005154001992195845
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005499579012393951
Validation loss = 0.005354241002351046
Validation loss = 0.005382194183766842
Validation loss = 0.005044207908213139
Validation loss = 0.005058922339230776
Validation loss = 0.005249134264886379
Validation loss = 0.005327008664608002
Validation loss = 0.005138644482940435
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 78
average number of affinization = 35.824120603015075
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 64
average number of affinization = 35.965
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 61
average number of affinization = 36.08955223880597
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 69
average number of affinization = 36.25247524752475
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 82
average number of affinization = 36.477832512315274
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 72
average number of affinization = 36.65196078431372
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 300      |
| Iteration     | 32       |
| MaximumReturn | 302      |
| MinimumReturn | 299      |
| TotalSamples  | 136000   |
----------------------------
