CustomSpiceEnv has not been registered yet
Loading CPO.yaml from /scratch1/dsc5636/anaconda3/envs/omnisafe/lib/python3.8/site-packages/omnisafe/utils/../configs/on-policy/CPO.yaml
Logging data to ./logs/CPO-{Pendulum-v1}/seed-000-2025-05-07-18-09-23/progress.csv
Save with config in config.json
INFO: Start training
For episode num 1  Steps count? : 46, Cost: 1.0
For episode num 2  Steps count? : 59, Cost: 2.0
For episode num 3  Steps count? : 49, Cost: 3.0
For episode num 4  Steps count? : 57, Cost: 4.0
For episode num 5  Steps count? : 48, Cost: 5.0
For episode num 6  Steps count? : 46, Cost: 6.0
For episode num 7  Steps count? : 52, Cost: 7.0
For episode num 8  Steps count? : 43, Cost: 8.0
For episode num 9  Steps count? : 46, Cost: 9.0
For episode num 10  Steps count? : 49, Cost: 10.0
For episode num 11  Steps count? : 34, Cost: 11.0
For episode num 12  Steps count? : 68, Cost: 12.0
For episode num 13  Steps count? : 56, Cost: 13.0
For episode num 14  Steps count? : 51, Cost: 14.0
For episode num 15  Steps count? : 63, Cost: 15.0
For episode num 16  Steps count? : 31, Cost: 16.0
For episode num 17  Steps count? : 58, Cost: 17.0
For episode num 18  Steps count? : 33, Cost: 18.0
For episode num 19  Steps count? : 80, Cost: 19.0
For episode num 20  Steps count? : 65, Cost: 20.0
For episode num 21  Steps count? : 53, Cost: 21.0
For episode num 22  Steps count? : 60, Cost: 22.0
For episode num 23  Steps count? : 36, Cost: 23.0
For episode num 24  Steps count? : 83, Cost: 24.0
For episode num 25  Steps count? : 58, Cost: 25.0
For episode num 26  Steps count? : 36, Cost: 26.0
For episode num 27  Steps count? : 55, Cost: 27.0
For episode num 28  Steps count? : 38, Cost: 28.0
For episode num 29  Steps count? : 39, Cost: 29.0
For episode num 30  Steps count? : 59, Cost: 30.0
For episode num 31  Steps count? : 59, Cost: 31.0
For episode num 32  Steps count? : 46, Cost: 32.0
For episode num 33  Steps count? : 35, Cost: 33.0
For episode num 34  Steps count? : 81, Cost: 34.0
For episode num 35  Steps count? : 58, Cost: 35.0
For episode num 36  Steps count? : 69, Cost: 36.0
For episode num 37  Steps count? : 37, Cost: 37.0
For episode num 38  Steps count? : 45, Cost: 38.0
Warning: trajectory cut off when rollout by epoch at 19.0 steps.
Processing rollout for epoch: 0... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.010933153331279755 Actual: 0.010234568268060684
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -7.461653709411621      │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 52.1315803527832        │
│ Train/Epoch                   │ 0.0                     │
│ Train/Entropy                 │ 1.4194616079330444      │
│ Train/KL                      │ 0.0001955967309186235   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9972421526908875      │
│ Train/PolicyRatio/Min         │ 0.9972421526908875      │
│ Train/PolicyRatio/Max         │ 0.9972421526908875      │
│ Train/PolicyRatio/Std         │ 0.0019501065835356712   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 1.0005232095718384      │
│ TotalEnvSteps                 │ 2000.0                  │
│ Loss/Loss_pi                  │ -0.007749355398118496   │
│ Loss/Loss_pi/Delta            │ -0.007749355398118496   │
│ Value/Adv                     │ -3.504753109950798e-08  │
│ Loss/Loss_reward_critic       │ 0.728632926940918       │
│ Loss/Loss_reward_critic/Delta │ 0.728632926940918       │
│ Value/reward                  │ 0.060222335159778595    │
│ Loss/Loss_cost_critic         │ 0.06985160708427429     │
│ Loss/Loss_cost_critic/Delta   │ 0.06985160708427429     │
│ Value/cost                    │ -0.1336418092250824     │
│ Time/Total                    │ 2.8216359615325928      │
│ Time/Rollout                  │ 1.8295645713806152      │
│ Time/Update                   │ 0.9916694164276123      │
│ Time/Epoch                    │ 2.821259021759033       │
│ Time/FPS                      │ 708.9036865234375       │
│ Misc/Alpha                    │ 1.8334978818893433      │
│ Misc/FinalStepNorm            │ 0.20425178110599518     │
│ Misc/gradient_norm            │ 0.1244887262582779      │
│ Misc/xHx                      │ 0.005949335638433695    │
│ Misc/H_inv_g                  │ 0.1114000678062439      │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.02912355400621891     │
│ Misc/A                        │ 0.005852588452398777    │
│ Misc/B                        │ -1905478.0              │
│ Misc/q                        │ 0.005949335638433695    │
│ Misc/r                        │ -0.00017101253615692258 │
│ Misc/s                        │ 0.00030227634124457836  │
│ Misc/Lambda_star              │ 0.5454055666923523      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 39  Steps count? : 19, Cost: 38.0
For episode num 40  Steps count? : 61, Cost: 39.0
For episode num 41  Steps count? : 80, Cost: 40.0
For episode num 42  Steps count? : 49, Cost: 41.0
For episode num 43  Steps count? : 100, Cost: 41.0
For episode num 44  Steps count? : 67, Cost: 42.0
For episode num 45  Steps count? : 64, Cost: 43.0
For episode num 46  Steps count? : 82, Cost: 44.0
For episode num 47  Steps count? : 100, Cost: 45.0
For episode num 48  Steps count? : 42, Cost: 46.0
For episode num 49  Steps count? : 100, Cost: 46.0
For episode num 50  Steps count? : 34, Cost: 47.0
For episode num 51  Steps count? : 42, Cost: 48.0
For episode num 52  Steps count? : 61, Cost: 49.0
For episode num 53  Steps count? : 55, Cost: 50.0
For episode num 54  Steps count? : 51, Cost: 51.0
For episode num 55  Steps count? : 55, Cost: 52.0
For episode num 56  Steps count? : 55, Cost: 53.0
For episode num 57  Steps count? : 45, Cost: 54.0
For episode num 58  Steps count? : 33, Cost: 55.0
For episode num 59  Steps count? : 38, Cost: 56.0
For episode num 60  Steps count? : 97, Cost: 57.0
For episode num 61  Steps count? : 39, Cost: 58.0
For episode num 62  Steps count? : 57, Cost: 59.0
For episode num 63  Steps count? : 47, Cost: 60.0
For episode num 64  Steps count? : 36, Cost: 61.0
For episode num 65  Steps count? : 66, Cost: 62.0
For episode num 66  Steps count? : 65, Cost: 63.0
For episode num 67  Steps count? : 83, Cost: 64.0
For episode num 68  Steps count? : 61, Cost: 65.0
For episode num 69  Steps count? : 79, Cost: 66.0
For episode num 70  Steps count? : 52, Cost: 67.0
For episode num 71  Steps count? : 38, Cost: 68.0
Warning: trajectory cut off when rollout by epoch at 66.0 steps.
Processing rollout for epoch: 1... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.018540047109127045 Actual: 0.018787702545523643
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -7.018678665161133      │
│ Metrics/EpCost                │ 0.9599999785423279      │
│ Metrics/EpLen                 │ 57.619998931884766      │
│ Train/Epoch                   │ 1.0                     │
│ Train/Entropy                 │ 1.4335931539535522      │
│ Train/KL                      │ 0.00025701787672005594  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.002468228340149       │
│ Train/PolicyRatio/Min         │ 1.002468228340149       │
│ Train/PolicyRatio/Max         │ 1.002468228340149       │
│ Train/PolicyRatio/Std         │ 0.0017453291220590472   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 1.0148111581802368      │
│ TotalEnvSteps                 │ 4000.0                  │
│ Loss/Loss_pi                  │ -0.014608627185225487   │
│ Loss/Loss_pi/Delta            │ -0.006859271787106991   │
│ Value/Adv                     │ -1.3792514153010416e-07 │
│ Loss/Loss_reward_critic       │ 0.47209709882736206     │
│ Loss/Loss_reward_critic/Delta │ -0.2565358281135559     │
│ Value/reward                  │ -1.8242703676223755     │
│ Loss/Loss_cost_critic         │ 0.05798511207103729     │
│ Loss/Loss_cost_critic/Delta   │ -0.011866495013237      │
│ Value/cost                    │ 0.12825025618076324     │
│ Time/Total                    │ 5.837592601776123       │
│ Time/Rollout                  │ 2.145836353302002       │
│ Time/Update                   │ 0.8464951515197754      │
│ Time/Epoch                    │ 2.9923532009124756      │
│ Time/FPS                      │ 668.3707275390625       │
│ Misc/Alpha                    │ 1.0797441005706787      │
│ Misc/FinalStepNorm            │ 0.13209302723407745     │
│ Misc/gradient_norm            │ 0.2811804413795471      │
│ Misc/xHx                      │ 0.017154894769191742    │
│ Misc/H_inv_g                  │ 0.12233734130859375     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.06062992289662361     │
│ Misc/A                        │ 0.011941536329686642    │
│ Misc/B                        │ -894869.5               │
│ Misc/q                        │ 0.017154894769191742    │
│ Misc/r                        │ -0.0018349041929468513  │
│ Misc/s                        │ 0.0006458066054619849   │
│ Misc/Lambda_star              │ 0.9261453747749329      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 72  Steps count? : 66, Cost: 68.0
For episode num 73  Steps count? : 48, Cost: 69.0
For episode num 74  Steps count? : 48, Cost: 70.0
For episode num 75  Steps count? : 48, Cost: 71.0
For episode num 76  Steps count? : 55, Cost: 72.0
For episode num 77  Steps count? : 41, Cost: 73.0
For episode num 78  Steps count? : 100, Cost: 73.0
For episode num 79  Steps count? : 39, Cost: 74.0
For episode num 80  Steps count? : 62, Cost: 75.0
For episode num 81  Steps count? : 50, Cost: 76.0
For episode num 82  Steps count? : 73, Cost: 77.0
For episode num 83  Steps count? : 39, Cost: 78.0
For episode num 84  Steps count? : 100, Cost: 78.0
For episode num 85  Steps count? : 65, Cost: 79.0
For episode num 86  Steps count? : 55, Cost: 80.0
For episode num 87  Steps count? : 50, Cost: 81.0
For episode num 88  Steps count? : 46, Cost: 82.0
For episode num 89  Steps count? : 100, Cost: 82.0
For episode num 90  Steps count? : 50, Cost: 83.0
For episode num 91  Steps count? : 57, Cost: 84.0
For episode num 92  Steps count? : 46, Cost: 85.0
For episode num 93  Steps count? : 49, Cost: 86.0
For episode num 94  Steps count? : 64, Cost: 87.0
For episode num 95  Steps count? : 45, Cost: 88.0
For episode num 96  Steps count? : 34, Cost: 89.0
For episode num 97  Steps count? : 99, Cost: 90.0
For episode num 98  Steps count? : 30, Cost: 91.0
For episode num 99  Steps count? : 51, Cost: 92.0
For episode num 100  Steps count? : 44, Cost: 93.0
For episode num 101  Steps count? : 63, Cost: 94.0
For episode num 102  Steps count? : 40, Cost: 95.0
For episode num 103  Steps count? : 67, Cost: 96.0
For episode num 104  Steps count? : 57, Cost: 97.0
For episode num 105  Steps count? : 68, Cost: 98.0
For episode num 106  Steps count? : 100, Cost: 98.0
Warning: trajectory cut off when rollout by epoch at 17.0 steps.
Processing rollout for epoch: 2... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.013220008462667465 Actual: 0.01391338836401701
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -7.245092391967773      │
│ Metrics/EpCost                │ 0.9200000166893005      │
│ Metrics/EpLen                 │ 57.47999954223633       │
│ Train/Epoch                   │ 2.0                     │
│ Train/Entropy                 │ 1.4432731866836548      │
│ Train/KL                      │ 0.00026772194541990757  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9979903101921082      │
│ Train/PolicyRatio/Min         │ 0.9979903101921082      │
│ Train/PolicyRatio/Max         │ 0.9979903101921082      │
│ Train/PolicyRatio/Std         │ 0.0014211074449121952   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 1.0246351957321167      │
│ TotalEnvSteps                 │ 6000.0                  │
│ Loss/Loss_pi                  │ -0.010533278807997704   │
│ Loss/Loss_pi/Delta            │ 0.004075348377227783    │
│ Value/Adv                     │ 1.2636184543168838e-08  │
│ Loss/Loss_reward_critic       │ 0.5087975263595581      │
│ Loss/Loss_reward_critic/Delta │ 0.036700427532196045    │
│ Value/reward                  │ -2.9400620460510254     │
│ Loss/Loss_cost_critic         │ 0.04965433105826378     │
│ Loss/Loss_cost_critic/Delta   │ -0.008330781012773514   │
│ Value/cost                    │ 0.3478032052516937      │
│ Time/Total                    │ 8.534188270568848       │
│ Time/Rollout                  │ 1.6432511806488037      │
│ Time/Update                   │ 1.0349433422088623      │
│ Time/Epoch                    │ 2.6782140731811523      │
│ Time/FPS                      │ 746.7666015625          │
│ Misc/Alpha                    │ 1.5156768560409546      │
│ Misc/FinalStepNorm            │ 0.1572716385126114      │
│ Misc/gradient_norm            │ 0.1916843205690384      │
│ Misc/xHx                      │ 0.008705951273441315    │
│ Misc/H_inv_g                  │ 0.10376331210136414     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.050732310861349106    │
│ Misc/A                        │ 0.00866750255227089     │
│ Misc/B                        │ -782800.8125            │
│ Misc/q                        │ 0.008705951273441315    │
│ Misc/r                        │ -0.00016876053996384144 │
│ Misc/s                        │ 0.0007407229859381914   │
│ Misc/Lambda_star              │ 0.6597712635993958      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 107  Steps count? : 17, Cost: 98.0
For episode num 108  Steps count? : 48, Cost: 99.0
For episode num 109  Steps count? : 31, Cost: 100.0
For episode num 110  Steps count? : 90, Cost: 101.0
For episode num 111  Steps count? : 46, Cost: 102.0
For episode num 112  Steps count? : 48, Cost: 103.0
For episode num 113  Steps count? : 42, Cost: 104.0
For episode num 114  Steps count? : 100, Cost: 104.0
For episode num 115  Steps count? : 73, Cost: 105.0
For episode num 116  Steps count? : 100, Cost: 105.0
For episode num 117  Steps count? : 55, Cost: 106.0
For episode num 118  Steps count? : 36, Cost: 107.0
For episode num 119  Steps count? : 37, Cost: 108.0
For episode num 120  Steps count? : 85, Cost: 109.0
For episode num 121  Steps count? : 53, Cost: 110.0
For episode num 122  Steps count? : 39, Cost: 111.0
For episode num 123  Steps count? : 60, Cost: 112.0
For episode num 124  Steps count? : 54, Cost: 113.0
For episode num 125  Steps count? : 47, Cost: 114.0
For episode num 126  Steps count? : 100, Cost: 114.0
For episode num 127  Steps count? : 50, Cost: 115.0
For episode num 128  Steps count? : 41, Cost: 116.0
For episode num 129  Steps count? : 33, Cost: 117.0
For episode num 130  Steps count? : 39, Cost: 118.0
For episode num 131  Steps count? : 61, Cost: 119.0
For episode num 132  Steps count? : 55, Cost: 120.0
For episode num 133  Steps count? : 86, Cost: 121.0
For episode num 134  Steps count? : 56, Cost: 122.0
For episode num 135  Steps count? : 58, Cost: 123.0
For episode num 136  Steps count? : 61, Cost: 124.0
For episode num 137  Steps count? : 66, Cost: 125.0
For episode num 138  Steps count? : 39, Cost: 126.0
For episode num 139  Steps count? : 34, Cost: 127.0
For episode num 140  Steps count? : 40, Cost: 128.0
For episode num 141  Steps count? : 37, Cost: 129.0
For episode num 142  Steps count? : 32, Cost: 130.0
For episode num 143  Steps count? : 34, Cost: 131.0
Warning: trajectory cut off when rollout by epoch at 34.0 steps.
Processing rollout for epoch: 3... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.009565005078911781 Actual: 0.010219395160675049
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -6.26151180267334      │
│ Metrics/EpCost                │ 0.9200000166893005     │
│ Metrics/EpLen                 │ 55.540000915527344     │
│ Train/Epoch                   │ 3.0                    │
│ Train/Entropy                 │ 1.430464267730713      │
│ Train/KL                      │ 0.0002625216729938984  │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 0.9977089762687683     │
│ Train/PolicyRatio/Min         │ 0.9977089762687683     │
│ Train/PolicyRatio/Max         │ 0.9977089762687683     │
│ Train/PolicyRatio/Std         │ 0.001619984395802021   │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 1.011643409729004      │
│ TotalEnvSteps                 │ 8000.0                 │
│ Loss/Loss_pi                  │ -0.007695901207625866  │
│ Loss/Loss_pi/Delta            │ 0.0028373776003718376  │
│ Value/Adv                     │ -9.65595248203499e-09  │
│ Loss/Loss_reward_critic       │ 0.5034195184707642     │
│ Loss/Loss_reward_critic/Delta │ -0.005378007888793945  │
│ Value/reward                  │ -3.753034830093384     │
│ Loss/Loss_cost_critic         │ 0.040117911994457245   │
│ Loss/Loss_cost_critic/Delta   │ -0.009536419063806534  │
│ Value/cost                    │ 0.45873576402664185    │
│ Time/Total                    │ 11.919028282165527     │
│ Time/Rollout                  │ 2.2822439670562744     │
│ Time/Update                   │ 1.0786654949188232     │
│ Time/Epoch                    │ 3.360935688018799      │
│ Time/FPS                      │ 595.0725708007812      │
│ Misc/Alpha                    │ 2.1012675762176514     │
│ Misc/FinalStepNorm            │ 0.23295344412326813    │
│ Misc/gradient_norm            │ 0.09920871257781982    │
│ Misc/xHx                      │ 0.0045296670868992805  │
│ Misc/H_inv_g                  │ 0.11086330562829971    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.03922232985496521    │
│ Misc/A                        │ 0.004528788384050131   │
│ Misc/B                        │ -1528461.0             │
│ Misc/q                        │ 0.0045296670868992805  │
│ Misc/r                        │ -1.825587241910398e-05 │
│ Misc/s                        │ 0.0003793561481870711  │
│ Misc/Lambda_star              │ 0.4759032130241394     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 144  Steps count? : 34, Cost: 131.0
For episode num 145  Steps count? : 33, Cost: 132.0
For episode num 146  Steps count? : 48, Cost: 133.0
For episode num 147  Steps count? : 69, Cost: 134.0
For episode num 148  Steps count? : 41, Cost: 135.0
For episode num 149  Steps count? : 51, Cost: 136.0
For episode num 150  Steps count? : 36, Cost: 137.0
For episode num 151  Steps count? : 44, Cost: 138.0
For episode num 152  Steps count? : 38, Cost: 139.0
For episode num 153  Steps count? : 30, Cost: 140.0
For episode num 154  Steps count? : 38, Cost: 141.0
For episode num 155  Steps count? : 43, Cost: 142.0
For episode num 156  Steps count? : 78, Cost: 143.0
For episode num 157  Steps count? : 46, Cost: 144.0
For episode num 158  Steps count? : 39, Cost: 145.0
For episode num 159  Steps count? : 63, Cost: 146.0
For episode num 160  Steps count? : 48, Cost: 147.0
For episode num 161  Steps count? : 79, Cost: 148.0
For episode num 162  Steps count? : 47, Cost: 149.0
For episode num 163  Steps count? : 31, Cost: 150.0
For episode num 164  Steps count? : 83, Cost: 151.0
For episode num 165  Steps count? : 60, Cost: 152.0
For episode num 166  Steps count? : 40, Cost: 153.0
For episode num 167  Steps count? : 56, Cost: 154.0
For episode num 168  Steps count? : 45, Cost: 155.0
For episode num 169  Steps count? : 64, Cost: 156.0
For episode num 170  Steps count? : 60, Cost: 157.0
For episode num 171  Steps count? : 42, Cost: 158.0
For episode num 172  Steps count? : 76, Cost: 159.0
For episode num 173  Steps count? : 43, Cost: 160.0
For episode num 174  Steps count? : 35, Cost: 161.0
For episode num 175  Steps count? : 66, Cost: 162.0
For episode num 176  Steps count? : 82, Cost: 163.0
For episode num 177  Steps count? : 39, Cost: 164.0
For episode num 178  Steps count? : 38, Cost: 165.0
For episode num 179  Steps count? : 38, Cost: 166.0
For episode num 180  Steps count? : 55, Cost: 167.0
For episode num 181  Steps count? : 69, Cost: 168.0
For episode num 182  Steps count? : 41, Cost: 169.0
For episode num 183  Steps count? : 36, Cost: 170.0
Warning: trajectory cut off when rollout by epoch at 30.0 steps.
Processing rollout for epoch: 4... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.013674476183950901 Actual: 0.011895385570824146
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -6.007748603820801      │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 50.2599983215332        │
│ Train/Epoch                   │ 4.0                     │
│ Train/Entropy                 │ 1.4095607995986938      │
│ Train/KL                      │ 0.00027292667073197663  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.000049114227295       │
│ Train/PolicyRatio/Min         │ 1.000049114227295       │
│ Train/PolicyRatio/Max         │ 1.000049114227295       │
│ Train/PolicyRatio/Std         │ 3.4729004255495965e-05  │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.9907134175300598      │
│ TotalEnvSteps                 │ 10000.0                 │
│ Loss/Loss_pi                  │ -0.008568532764911652   │
│ Loss/Loss_pi/Delta            │ -0.0008726315572857857  │
│ Value/Adv                     │ -2.9802322831784522e-09 │
│ Loss/Loss_reward_critic       │ 0.3467682898044586      │
│ Loss/Loss_reward_critic/Delta │ -0.15665122866630554    │
│ Value/reward                  │ -3.8643267154693604     │
│ Loss/Loss_cost_critic         │ 0.03324858099222183     │
│ Loss/Loss_cost_critic/Delta   │ -0.006869331002235413   │
│ Value/cost                    │ 0.5776059627532959      │
│ Time/Total                    │ 14.365941047668457      │
│ Time/Rollout                  │ 1.5864369869232178      │
│ Time/Update                   │ 0.8417336940765381      │
│ Time/Epoch                    │ 2.428189516067505       │
│ Time/FPS                      │ 823.6591796875          │
│ Misc/Alpha                    │ 1.4662611484527588      │
│ Misc/FinalStepNorm            │ 0.16820305585861206     │
│ Misc/gradient_norm            │ 0.1832278072834015      │
│ Misc/xHx                      │ 0.00930265337228775     │
│ Misc/H_inv_g                  │ 0.11471562087535858     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.028325190767645836    │
│ Misc/A                        │ 0.005756344646215439    │
│ Misc/B                        │ -2077399.875            │
│ Misc/q                        │ 0.00930265337228775     │
│ Misc/r                        │ 0.000991606735624373    │
│ Misc/s                        │ 0.0002772596781142056   │
│ Misc/Lambda_star              │ 0.6820067763328552      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 184  Steps count? : 30, Cost: 170.0
For episode num 185  Steps count? : 38, Cost: 171.0
For episode num 186  Steps count? : 34, Cost: 172.0
For episode num 187  Steps count? : 37, Cost: 173.0
For episode num 188  Steps count? : 38, Cost: 174.0
For episode num 189  Steps count? : 45, Cost: 175.0
For episode num 190  Steps count? : 39, Cost: 176.0
For episode num 191  Steps count? : 56, Cost: 177.0
For episode num 192  Steps count? : 38, Cost: 178.0
For episode num 193  Steps count? : 42, Cost: 179.0
For episode num 194  Steps count? : 77, Cost: 180.0
For episode num 195  Steps count? : 33, Cost: 181.0
For episode num 196  Steps count? : 48, Cost: 182.0
For episode num 197  Steps count? : 39, Cost: 183.0
For episode num 198  Steps count? : 31, Cost: 184.0
For episode num 199  Steps count? : 35, Cost: 185.0
For episode num 200  Steps count? : 31, Cost: 186.0
For episode num 201  Steps count? : 57, Cost: 187.0
For episode num 202  Steps count? : 31, Cost: 188.0
For episode num 203  Steps count? : 49, Cost: 189.0
For episode num 204  Steps count? : 50, Cost: 190.0
For episode num 205  Steps count? : 64, Cost: 191.0
For episode num 206  Steps count? : 46, Cost: 192.0
For episode num 207  Steps count? : 41, Cost: 193.0
For episode num 208  Steps count? : 78, Cost: 194.0
For episode num 209  Steps count? : 47, Cost: 195.0
For episode num 210  Steps count? : 57, Cost: 196.0
For episode num 211  Steps count? : 76, Cost: 197.0
For episode num 212  Steps count? : 54, Cost: 198.0
For episode num 213  Steps count? : 34, Cost: 199.0
For episode num 214  Steps count? : 35, Cost: 200.0
For episode num 215  Steps count? : 43, Cost: 201.0
For episode num 216  Steps count? : 40, Cost: 202.0
For episode num 217  Steps count? : 44, Cost: 203.0
For episode num 218  Steps count? : 47, Cost: 204.0
For episode num 219  Steps count? : 48, Cost: 205.0
For episode num 220  Steps count? : 35, Cost: 206.0
For episode num 221  Steps count? : 43, Cost: 207.0
For episode num 222  Steps count? : 31, Cost: 208.0
For episode num 223  Steps count? : 57, Cost: 209.0
For episode num 224  Steps count? : 39, Cost: 210.0
For episode num 225  Steps count? : 46, Cost: 211.0
For episode num 226  Steps count? : 35, Cost: 212.0
For episode num 227  Steps count? : 44, Cost: 213.0
For episode num 228  Steps count? : 34, Cost: 214.0
Warning: trajectory cut off when rollout by epoch at 34.0 steps.
Processing rollout for epoch: 5... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.015690580010414124 Actual: 0.015969574451446533
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -5.443100452423096     │
│ Metrics/EpCost                │ 1.0                    │
│ Metrics/EpLen                 │ 44.86000061035156      │
│ Train/Epoch                   │ 5.0                    │
│ Train/Entropy                 │ 1.405582070350647      │
│ Train/KL                      │ 0.00027239113114774227 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 0.9988113045692444     │
│ Train/PolicyRatio/Min         │ 0.9988113045692444     │
│ Train/PolicyRatio/Max         │ 0.9988113045692444     │
│ Train/PolicyRatio/Std         │ 0.0008405205444432795  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.9867345690727234     │
│ TotalEnvSteps                 │ 12000.0                │
│ Loss/Loss_pi                  │ -0.0115015534684062    │
│ Loss/Loss_pi/Delta            │ -0.002933020703494549  │
│ Value/Adv                     │ 8.106232129989621e-09  │
│ Loss/Loss_reward_critic       │ 0.1581135392189026     │
│ Loss/Loss_reward_critic/Delta │ -0.18865475058555603   │
│ Value/reward                  │ -3.808898448944092     │
│ Loss/Loss_cost_critic         │ 0.025369832292199135   │
│ Loss/Loss_cost_critic/Delta   │ -0.007878748700022697  │
│ Value/cost                    │ 0.7043642401695251     │
│ Time/Total                    │ 16.80768394470215      │
│ Time/Rollout                  │ 1.5881049633026123     │
│ Time/Update                   │ 0.8349704742431641     │
│ Time/Epoch                    │ 2.423098087310791      │
│ Time/FPS                      │ 825.3898315429688      │
│ Misc/Alpha                    │ 1.277092695236206      │
│ Misc/FinalStepNorm            │ 0.14198362827301025    │
│ Misc/gradient_norm            │ 0.2623710334300995     │
│ Misc/xHx                      │ 0.012262662872672081   │
│ Misc/H_inv_g                  │ 0.1111772358417511     │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.02998645044863224    │
│ Misc/A                        │ 0.0019255047664046288  │
│ Misc/B                        │ -2881835.25            │
│ Misc/q                        │ 0.012262662872672081   │
│ Misc/r                        │ 0.0014373987214639783  │
│ Misc/s                        │ 0.00019986263941973448 │
│ Misc/Lambda_star              │ 0.7830285429954529     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 229  Steps count? : 34, Cost: 214.0
For episode num 230  Steps count? : 50, Cost: 215.0
For episode num 231  Steps count? : 35, Cost: 216.0
For episode num 232  Steps count? : 33, Cost: 217.0
For episode num 233  Steps count? : 36, Cost: 218.0
For episode num 234  Steps count? : 41, Cost: 219.0
For episode num 235  Steps count? : 77, Cost: 220.0
For episode num 236  Steps count? : 60, Cost: 221.0
For episode num 237  Steps count? : 32, Cost: 222.0
For episode num 238  Steps count? : 44, Cost: 223.0
For episode num 239  Steps count? : 66, Cost: 224.0
For episode num 240  Steps count? : 45, Cost: 225.0
For episode num 241  Steps count? : 77, Cost: 226.0
For episode num 242  Steps count? : 55, Cost: 227.0
For episode num 243  Steps count? : 64, Cost: 228.0
For episode num 244  Steps count? : 28, Cost: 229.0
For episode num 245  Steps count? : 28, Cost: 230.0
For episode num 246  Steps count? : 58, Cost: 231.0
For episode num 247  Steps count? : 53, Cost: 232.0
For episode num 248  Steps count? : 31, Cost: 233.0
For episode num 249  Steps count? : 40, Cost: 234.0
For episode num 250  Steps count? : 69, Cost: 235.0
For episode num 251  Steps count? : 32, Cost: 236.0
For episode num 252  Steps count? : 52, Cost: 237.0
For episode num 253  Steps count? : 41, Cost: 238.0
For episode num 254  Steps count? : 44, Cost: 239.0
For episode num 255  Steps count? : 41, Cost: 240.0
For episode num 256  Steps count? : 47, Cost: 241.0
For episode num 257  Steps count? : 30, Cost: 242.0
For episode num 258  Steps count? : 63, Cost: 243.0
For episode num 259  Steps count? : 53, Cost: 244.0
For episode num 260  Steps count? : 37, Cost: 245.0
For episode num 261  Steps count? : 41, Cost: 246.0
For episode num 262  Steps count? : 42, Cost: 247.0
For episode num 263  Steps count? : 34, Cost: 248.0
For episode num 264  Steps count? : 34, Cost: 249.0
For episode num 265  Steps count? : 29, Cost: 250.0
For episode num 266  Steps count? : 40, Cost: 251.0
For episode num 267  Steps count? : 33, Cost: 252.0
For episode num 268  Steps count? : 41, Cost: 253.0
For episode num 269  Steps count? : 41, Cost: 254.0
For episode num 270  Steps count? : 40, Cost: 255.0
For episode num 271  Steps count? : 66, Cost: 256.0
For episode num 272  Steps count? : 39, Cost: 257.0
For episode num 273  Steps count? : 56, Cost: 258.0
Warning: trajectory cut off when rollout by epoch at 2.0 steps.
Processing rollout for epoch: 6... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.010831741616129875 Actual: 0.011604063212871552
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -5.349237442016602      │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 45.060001373291016      │
│ Train/Epoch                   │ 6.0                     │
│ Train/Entropy                 │ 1.4024938344955444      │
│ Train/KL                      │ 0.0002538692788220942   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9989199638366699      │
│ Train/PolicyRatio/Min         │ 0.9989199638366699      │
│ Train/PolicyRatio/Max         │ 0.9989199638366699      │
│ Train/PolicyRatio/Std         │ 0.0007637289818376303   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.983694851398468       │
│ TotalEnvSteps                 │ 14000.0                 │
│ Loss/Loss_pi                  │ -0.008457597345113754   │
│ Loss/Loss_pi/Delta            │ 0.003043956123292446    │
│ Value/Adv                     │ -1.0848045128852846e-08 │
│ Loss/Loss_reward_critic       │ 0.1443740427494049      │
│ Loss/Loss_reward_critic/Delta │ -0.01373949646949768    │
│ Value/reward                  │ -3.6589040756225586     │
│ Loss/Loss_cost_critic         │ 0.020538777112960815    │
│ Loss/Loss_cost_critic/Delta   │ -0.004831055179238319   │
│ Value/cost                    │ 0.7308014035224915      │
│ Time/Total                    │ 19.307828903198242      │
│ Time/Rollout                  │ 1.630059003829956       │
│ Time/Update                   │ 0.8514950275421143      │
│ Time/Epoch                    │ 2.481571674346924       │
│ Time/FPS                      │ 805.9411010742188       │
│ Misc/Alpha                    │ 1.8491469621658325      │
│ Misc/FinalStepNorm            │ 0.22063373029232025     │
│ Misc/gradient_norm            │ 0.1339220404624939      │
│ Misc/xHx                      │ 0.005849064327776432    │
│ Misc/H_inv_g                  │ 0.11931649595499039     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.02366163209080696     │
│ Misc/A                        │ 0.0035763767082244158   │
│ Misc/B                        │ -5931522.5              │
│ Misc/q                        │ 0.005849064327776432    │
│ Misc/r                        │ 0.0004697837866842747   │
│ Misc/s                        │ 9.709829464554787e-05   │
│ Misc/Lambda_star              │ 0.5407899022102356      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 274  Steps count? : 2, Cost: 258.0
For episode num 275  Steps count? : 37, Cost: 259.0
For episode num 276  Steps count? : 30, Cost: 260.0
For episode num 277  Steps count? : 29, Cost: 261.0
For episode num 278  Steps count? : 60, Cost: 262.0
For episode num 279  Steps count? : 31, Cost: 263.0
For episode num 280  Steps count? : 65, Cost: 264.0
For episode num 281  Steps count? : 35, Cost: 265.0
For episode num 282  Steps count? : 37, Cost: 266.0
For episode num 283  Steps count? : 32, Cost: 267.0
For episode num 284  Steps count? : 28, Cost: 268.0
For episode num 285  Steps count? : 47, Cost: 269.0
For episode num 286  Steps count? : 31, Cost: 270.0
For episode num 287  Steps count? : 34, Cost: 271.0
For episode num 288  Steps count? : 35, Cost: 272.0
For episode num 289  Steps count? : 30, Cost: 273.0
For episode num 290  Steps count? : 37, Cost: 274.0
For episode num 291  Steps count? : 47, Cost: 275.0
For episode num 292  Steps count? : 35, Cost: 276.0
For episode num 293  Steps count? : 36, Cost: 277.0
For episode num 294  Steps count? : 34, Cost: 278.0
For episode num 295  Steps count? : 32, Cost: 279.0
For episode num 296  Steps count? : 33, Cost: 280.0
For episode num 297  Steps count? : 34, Cost: 281.0
For episode num 298  Steps count? : 40, Cost: 282.0
For episode num 299  Steps count? : 41, Cost: 283.0
For episode num 300  Steps count? : 33, Cost: 284.0
For episode num 301  Steps count? : 36, Cost: 285.0
For episode num 302  Steps count? : 37, Cost: 286.0
For episode num 303  Steps count? : 41, Cost: 287.0
For episode num 304  Steps count? : 35, Cost: 288.0
For episode num 305  Steps count? : 31, Cost: 289.0
For episode num 306  Steps count? : 39, Cost: 290.0
For episode num 307  Steps count? : 30, Cost: 291.0
For episode num 308  Steps count? : 54, Cost: 292.0
For episode num 309  Steps count? : 30, Cost: 293.0
For episode num 310  Steps count? : 48, Cost: 294.0
For episode num 311  Steps count? : 40, Cost: 295.0
For episode num 312  Steps count? : 35, Cost: 296.0
For episode num 313  Steps count? : 34, Cost: 297.0
For episode num 314  Steps count? : 47, Cost: 298.0
For episode num 315  Steps count? : 27, Cost: 299.0
For episode num 316  Steps count? : 36, Cost: 300.0
For episode num 317  Steps count? : 32, Cost: 301.0
For episode num 318  Steps count? : 38, Cost: 302.0
For episode num 319  Steps count? : 34, Cost: 303.0
For episode num 320  Steps count? : 30, Cost: 304.0
For episode num 321  Steps count? : 32, Cost: 305.0
For episode num 322  Steps count? : 33, Cost: 306.0
For episode num 323  Steps count? : 36, Cost: 307.0
For episode num 324  Steps count? : 29, Cost: 308.0
For episode num 325  Steps count? : 35, Cost: 309.0
For episode num 326  Steps count? : 29, Cost: 310.0
For episode num 327  Steps count? : 36, Cost: 311.0
For episode num 328  Steps count? : 31, Cost: 312.0
Warning: trajectory cut off when rollout by epoch at 42.0 steps.
Processing rollout for epoch: 7... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.011089103296399117 Actual: 0.011213366873562336
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.880446910858154     │
│ Metrics/EpCost                │ 1.0                    │
│ Metrics/EpLen                 │ 36.040000915527344     │
│ Train/Epoch                   │ 7.0                    │
│ Train/Entropy                 │ 1.3940433263778687     │
│ Train/KL                      │ 0.0002802529779728502  │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.0005793571472168     │
│ Train/PolicyRatio/Min         │ 1.0005793571472168     │
│ Train/PolicyRatio/Max         │ 1.0005793571472168     │
│ Train/PolicyRatio/Std         │ 0.00040966738015413284 │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.9754214286804199     │
│ TotalEnvSteps                 │ 16000.0                │
│ Loss/Loss_pi                  │ -0.008213984780013561  │
│ Loss/Loss_pi/Delta            │ 0.00024361256510019302 │
│ Value/Adv                     │ 4.887581006585151e-09  │
│ Loss/Loss_reward_critic       │ 0.09899017959833145    │
│ Loss/Loss_reward_critic/Delta │ -0.045383863151073456  │
│ Value/reward                  │ -3.5035347938537598    │
│ Loss/Loss_cost_critic         │ 0.016598187386989594   │
│ Loss/Loss_cost_critic/Delta   │ -0.003940589725971222  │
│ Value/cost                    │ 0.7835952043533325     │
│ Time/Total                    │ 21.78802490234375      │
│ Time/Rollout                  │ 1.60917067527771       │
│ Time/Update                   │ 0.8529794216156006     │
│ Time/Epoch                    │ 2.462172269821167      │
│ Time/FPS                      │ 812.2910766601562      │
│ Misc/Alpha                    │ 1.809597134590149      │
│ Misc/FinalStepNorm            │ 0.13134165108203888    │
│ Misc/gradient_norm            │ 0.1815805286169052     │
│ Misc/xHx                      │ 0.0061075277626514435  │
│ Misc/H_inv_g                  │ 0.07258059829473495    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.014847407117486      │
│ Misc/A                        │ 0.0026035073678940535  │
│ Misc/B                        │ -11235585.0            │
│ Misc/q                        │ 0.0061075277626514435  │
│ Misc/r                        │ 0.00042383489198982716 │
│ Misc/s                        │ 5.125568713992834e-05  │
│ Misc/Lambda_star              │ 0.5526092052459717     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 329  Steps count? : 42, Cost: 312.0
For episode num 330  Steps count? : 36, Cost: 313.0
For episode num 331  Steps count? : 33, Cost: 314.0
For episode num 332  Steps count? : 45, Cost: 315.0
For episode num 333  Steps count? : 35, Cost: 316.0
For episode num 334  Steps count? : 33, Cost: 317.0
For episode num 335  Steps count? : 31, Cost: 318.0
For episode num 336  Steps count? : 47, Cost: 319.0
For episode num 337  Steps count? : 28, Cost: 320.0
For episode num 338  Steps count? : 73, Cost: 321.0
For episode num 339  Steps count? : 27, Cost: 322.0
For episode num 340  Steps count? : 47, Cost: 323.0
For episode num 341  Steps count? : 53, Cost: 324.0
For episode num 342  Steps count? : 35, Cost: 325.0
For episode num 343  Steps count? : 39, Cost: 326.0
For episode num 344  Steps count? : 38, Cost: 327.0
For episode num 345  Steps count? : 31, Cost: 328.0
For episode num 346  Steps count? : 44, Cost: 329.0
For episode num 347  Steps count? : 38, Cost: 330.0
For episode num 348  Steps count? : 32, Cost: 331.0
For episode num 349  Steps count? : 39, Cost: 332.0
For episode num 350  Steps count? : 56, Cost: 333.0
For episode num 351  Steps count? : 29, Cost: 334.0
For episode num 352  Steps count? : 31, Cost: 335.0
For episode num 353  Steps count? : 28, Cost: 336.0
For episode num 354  Steps count? : 35, Cost: 337.0
For episode num 355  Steps count? : 34, Cost: 338.0
For episode num 356  Steps count? : 31, Cost: 339.0
For episode num 357  Steps count? : 33, Cost: 340.0
For episode num 358  Steps count? : 36, Cost: 341.0
For episode num 359  Steps count? : 32, Cost: 342.0
For episode num 360  Steps count? : 35, Cost: 343.0
For episode num 361  Steps count? : 35, Cost: 344.0
For episode num 362  Steps count? : 31, Cost: 345.0
For episode num 363  Steps count? : 37, Cost: 346.0
For episode num 364  Steps count? : 38, Cost: 347.0
For episode num 365  Steps count? : 66, Cost: 348.0
For episode num 366  Steps count? : 29, Cost: 349.0
For episode num 367  Steps count? : 33, Cost: 350.0
For episode num 368  Steps count? : 34, Cost: 351.0
For episode num 369  Steps count? : 45, Cost: 352.0
For episode num 370  Steps count? : 28, Cost: 353.0
For episode num 371  Steps count? : 31, Cost: 354.0
For episode num 372  Steps count? : 36, Cost: 355.0
For episode num 373  Steps count? : 59, Cost: 356.0
For episode num 374  Steps count? : 28, Cost: 357.0
For episode num 375  Steps count? : 34, Cost: 358.0
For episode num 376  Steps count? : 33, Cost: 359.0
For episode num 377  Steps count? : 33, Cost: 360.0
For episode num 378  Steps count? : 39, Cost: 361.0
For episode num 379  Steps count? : 44, Cost: 362.0
For episode num 380  Steps count? : 47, Cost: 363.0
For episode num 381  Steps count? : 41, Cost: 364.0
For episode num 382  Steps count? : 30, Cost: 365.0
Warning: trajectory cut off when rollout by epoch at 5.0 steps.
Processing rollout for epoch: 8... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.013284699991345406 Actual: 0.012794475071132183
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.987613201141357     │
│ Metrics/EpCost                │ 1.0                    │
│ Metrics/EpLen                 │ 37.619998931884766     │
│ Train/Epoch                   │ 8.0                    │
│ Train/Entropy                 │ 1.359254240989685      │
│ Train/KL                      │ 0.00025622316752560437 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 0.9980414509773254     │
│ Train/PolicyRatio/Min         │ 0.9980414509773254     │
│ Train/PolicyRatio/Max         │ 0.9980414509773254     │
│ Train/PolicyRatio/Std         │ 0.0013848610688000917  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.9422996640205383     │
│ TotalEnvSteps                 │ 18000.0                │
│ Loss/Loss_pi                  │ -0.009451339021325111  │
│ Loss/Loss_pi/Delta            │ -0.0012373542413115501 │
│ Value/Adv                     │ -1.907348723406699e-09 │
│ Loss/Loss_reward_critic       │ 0.09712426364421844    │
│ Loss/Loss_reward_critic/Delta │ -0.0018659159541130066 │
│ Value/reward                  │ -3.4013733863830566    │
│ Loss/Loss_cost_critic         │ 0.013668856583535671   │
│ Loss/Loss_cost_critic/Delta   │ -0.0029293308034539223 │
│ Value/cost                    │ 0.8121166229248047     │
│ Time/Total                    │ 24.746973037719727     │
│ Time/Rollout                  │ 1.9780759811401367     │
│ Time/Update                   │ 0.9622058868408203     │
│ Time/Epoch                    │ 2.9402995109558105     │
│ Time/FPS                      │ 680.2030639648438      │
│ Misc/Alpha                    │ 1.507487416267395      │
│ Misc/FinalStepNorm            │ 0.2110074758529663     │
│ Misc/gradient_norm            │ 0.21452704071998596    │
│ Misc/xHx                      │ 0.008800799027085304   │
│ Misc/H_inv_g                  │ 0.13997294008731842    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.017520586028695107   │
│ Misc/A                        │ 0.006363928318023682   │
│ Misc/B                        │ -12663037.0            │
│ Misc/q                        │ 0.008800799027085304   │
│ Misc/r                        │ 0.0003329342871438712  │
│ Misc/s                        │ 4.547671778709628e-05  │
│ Misc/Lambda_star              │ 0.6633554697036743     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 383  Steps count? : 5, Cost: 365.0
For episode num 384  Steps count? : 41, Cost: 366.0
For episode num 385  Steps count? : 40, Cost: 367.0
For episode num 386  Steps count? : 31, Cost: 368.0
For episode num 387  Steps count? : 44, Cost: 369.0
For episode num 388  Steps count? : 30, Cost: 370.0
For episode num 389  Steps count? : 32, Cost: 371.0
For episode num 390  Steps count? : 35, Cost: 372.0
For episode num 391  Steps count? : 28, Cost: 373.0
For episode num 392  Steps count? : 28, Cost: 374.0
For episode num 393  Steps count? : 28, Cost: 375.0
For episode num 394  Steps count? : 28, Cost: 376.0
For episode num 395  Steps count? : 43, Cost: 377.0
For episode num 396  Steps count? : 30, Cost: 378.0
For episode num 397  Steps count? : 32, Cost: 379.0
For episode num 398  Steps count? : 31, Cost: 380.0
For episode num 399  Steps count? : 33, Cost: 381.0
For episode num 400  Steps count? : 45, Cost: 382.0
For episode num 401  Steps count? : 28, Cost: 383.0
For episode num 402  Steps count? : 50, Cost: 384.0
For episode num 403  Steps count? : 39, Cost: 385.0
For episode num 404  Steps count? : 31, Cost: 386.0
For episode num 405  Steps count? : 27, Cost: 387.0
For episode num 406  Steps count? : 34, Cost: 388.0
For episode num 407  Steps count? : 36, Cost: 389.0
For episode num 408  Steps count? : 28, Cost: 390.0
For episode num 409  Steps count? : 31, Cost: 391.0
For episode num 410  Steps count? : 53, Cost: 392.0
For episode num 411  Steps count? : 30, Cost: 393.0
For episode num 412  Steps count? : 38, Cost: 394.0
For episode num 413  Steps count? : 44, Cost: 395.0
For episode num 414  Steps count? : 66, Cost: 396.0
For episode num 415  Steps count? : 30, Cost: 397.0
For episode num 416  Steps count? : 46, Cost: 398.0
For episode num 417  Steps count? : 41, Cost: 399.0
For episode num 418  Steps count? : 37, Cost: 400.0
For episode num 419  Steps count? : 32, Cost: 401.0
For episode num 420  Steps count? : 40, Cost: 402.0
For episode num 421  Steps count? : 33, Cost: 403.0
For episode num 422  Steps count? : 33, Cost: 404.0
For episode num 423  Steps count? : 30, Cost: 405.0
For episode num 424  Steps count? : 38, Cost: 406.0
For episode num 425  Steps count? : 38, Cost: 407.0
For episode num 426  Steps count? : 35, Cost: 408.0
For episode num 427  Steps count? : 33, Cost: 409.0
For episode num 428  Steps count? : 32, Cost: 410.0
For episode num 429  Steps count? : 32, Cost: 411.0
For episode num 430  Steps count? : 48, Cost: 412.0
For episode num 431  Steps count? : 33, Cost: 413.0
For episode num 432  Steps count? : 39, Cost: 414.0
For episode num 433  Steps count? : 29, Cost: 415.0
For episode num 434  Steps count? : 40, Cost: 416.0
For episode num 435  Steps count? : 30, Cost: 417.0
For episode num 436  Steps count? : 49, Cost: 418.0
For episode num 437  Steps count? : 28, Cost: 419.0
For episode num 438  Steps count? : 42, Cost: 420.0
Warning: trajectory cut off when rollout by epoch at 18.0 steps.
Processing rollout for epoch: 9... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.00937739945948124 Actual: 0.009737614542245865
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.7787981033325195     │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 35.91999816894531       │
│ Train/Epoch                   │ 9.0                     │
│ Train/Entropy                 │ 1.3615951538085938      │
│ Train/KL                      │ 0.00024718724307604134  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0008115768432617      │
│ Train/PolicyRatio/Min         │ 1.0008115768432617      │
│ Train/PolicyRatio/Max         │ 1.0008115768432617      │
│ Train/PolicyRatio/Std         │ 0.0005739276530221105   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.9443474411964417      │
│ TotalEnvSteps                 │ 20000.0                 │
│ Loss/Loss_pi                  │ -0.007500220090150833   │
│ Loss/Loss_pi/Delta            │ 0.0019511189311742783   │
│ Value/Adv                     │ 1.668930060816365e-08   │
│ Loss/Loss_reward_critic       │ 0.07925896346569061     │
│ Loss/Loss_reward_critic/Delta │ -0.017865300178527832   │
│ Value/reward                  │ -3.3649487495422363     │
│ Loss/Loss_cost_critic         │ 0.011444058269262314    │
│ Loss/Loss_cost_critic/Delta   │ -0.0022247983142733574  │
│ Value/cost                    │ 0.8174512982368469      │
│ Time/Total                    │ 27.180225372314453      │
│ Time/Rollout                  │ 1.576479434967041       │
│ Time/Update                   │ 0.8367226123809814      │
│ Time/Epoch                    │ 2.4132273197174072      │
│ Time/FPS                      │ 828.76611328125         │
│ Misc/Alpha                    │ 2.1409859657287598      │
│ Misc/FinalStepNorm            │ 0.1637614667415619      │
│ Misc/gradient_norm            │ 0.16042275726795197     │
│ Misc/xHx                      │ 0.004363161977380514    │
│ Misc/H_inv_g                  │ 0.07648881524801254     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.021508483216166496    │
│ Misc/A                        │ 0.0025659557431936264   │
│ Misc/B                        │ -9753248.0              │
│ Misc/q                        │ 0.004363161977380514    │
│ Misc/r                        │ -0.00032578836544416845 │
│ Misc/s                        │ 5.904724821448326e-05   │
│ Misc/Lambda_star              │ 0.46707451343536377     │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 439  Steps count? : 18, Cost: 420.0
For episode num 440  Steps count? : 36, Cost: 421.0
For episode num 441  Steps count? : 29, Cost: 422.0
For episode num 442  Steps count? : 40, Cost: 423.0
For episode num 443  Steps count? : 44, Cost: 424.0
For episode num 444  Steps count? : 45, Cost: 425.0
For episode num 445  Steps count? : 34, Cost: 426.0
For episode num 446  Steps count? : 35, Cost: 427.0
For episode num 447  Steps count? : 38, Cost: 428.0
For episode num 448  Steps count? : 40, Cost: 429.0
For episode num 449  Steps count? : 37, Cost: 430.0
For episode num 450  Steps count? : 37, Cost: 431.0
For episode num 451  Steps count? : 33, Cost: 432.0
For episode num 452  Steps count? : 39, Cost: 433.0
For episode num 453  Steps count? : 30, Cost: 434.0
For episode num 454  Steps count? : 27, Cost: 435.0
For episode num 455  Steps count? : 36, Cost: 436.0
For episode num 456  Steps count? : 31, Cost: 437.0
For episode num 457  Steps count? : 32, Cost: 438.0
For episode num 458  Steps count? : 30, Cost: 439.0
For episode num 459  Steps count? : 32, Cost: 440.0
For episode num 460  Steps count? : 34, Cost: 441.0
For episode num 461  Steps count? : 34, Cost: 442.0
For episode num 462  Steps count? : 31, Cost: 443.0
For episode num 463  Steps count? : 45, Cost: 444.0
For episode num 464  Steps count? : 33, Cost: 445.0
For episode num 465  Steps count? : 34, Cost: 446.0
For episode num 466  Steps count? : 32, Cost: 447.0
For episode num 467  Steps count? : 31, Cost: 448.0
For episode num 468  Steps count? : 39, Cost: 449.0
For episode num 469  Steps count? : 53, Cost: 450.0
For episode num 470  Steps count? : 31, Cost: 451.0
For episode num 471  Steps count? : 43, Cost: 452.0
For episode num 472  Steps count? : 35, Cost: 453.0
For episode num 473  Steps count? : 37, Cost: 454.0
For episode num 474  Steps count? : 51, Cost: 455.0
For episode num 475  Steps count? : 36, Cost: 456.0
For episode num 476  Steps count? : 39, Cost: 457.0
For episode num 477  Steps count? : 62, Cost: 458.0
For episode num 478  Steps count? : 53, Cost: 459.0
For episode num 479  Steps count? : 42, Cost: 460.0
For episode num 480  Steps count? : 37, Cost: 461.0
For episode num 481  Steps count? : 54, Cost: 462.0
For episode num 482  Steps count? : 40, Cost: 463.0
For episode num 483  Steps count? : 33, Cost: 464.0
For episode num 484  Steps count? : 56, Cost: 465.0
For episode num 485  Steps count? : 33, Cost: 466.0
For episode num 486  Steps count? : 28, Cost: 467.0
For episode num 487  Steps count? : 35, Cost: 468.0
For episode num 488  Steps count? : 37, Cost: 469.0
For episode num 489  Steps count? : 32, Cost: 470.0
For episode num 490  Steps count? : 34, Cost: 471.0
For episode num 491  Steps count? : 30, Cost: 472.0
For episode num 492  Steps count? : 35, Cost: 473.0
Warning: trajectory cut off when rollout by epoch at 16.0 steps.
Processing rollout for epoch: 10... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.009811174124479294 Actual: 0.0076956781558692455
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.895974636077881     │
│ Metrics/EpCost                │ 1.0                    │
│ Metrics/EpLen                 │ 37.58000183105469      │
│ Train/Epoch                   │ 10.0                   │
│ Train/Entropy                 │ 1.3424266576766968     │
│ Train/KL                      │ 0.00028245829162187874 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.0017691850662231     │
│ Train/PolicyRatio/Min         │ 1.0017691850662231     │
│ Train/PolicyRatio/Max         │ 1.0017691850662231     │
│ Train/PolicyRatio/Std         │ 0.0012510308297351003  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.9265275597572327     │
│ TotalEnvSteps                 │ 22000.0                │
│ Loss/Loss_pi                  │ -0.005749693140387535  │
│ Loss/Loss_pi/Delta            │ 0.001750526949763298   │
│ Value/Adv                     │ 0.0                    │
│ Loss/Loss_reward_critic       │ 0.08545371145009995    │
│ Loss/Loss_reward_critic/Delta │ 0.006194747984409332   │
│ Value/reward                  │ -3.3407914638519287    │
│ Loss/Loss_cost_critic         │ 0.010205904953181744   │
│ Loss/Loss_cost_critic/Delta   │ -0.0012381533160805702 │
│ Value/cost                    │ 0.8140861988067627     │
│ Time/Total                    │ 29.655719757080078     │
│ Time/Rollout                  │ 1.5853335857391357     │
│ Time/Update                   │ 0.8710072040557861     │
│ Time/Epoch                    │ 2.4563651084899902     │
│ Time/FPS                      │ 814.2118530273438      │
│ Misc/Alpha                    │ 2.0476043224334717     │
│ Misc/FinalStepNorm            │ 0.17461855709552765    │
│ Misc/gradient_norm            │ 0.16153667867183685    │
│ Misc/xHx                      │ 0.004770204424858093   │
│ Misc/H_inv_g                  │ 0.0852794423699379     │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.012965202331542969   │
│ Misc/A                        │ 0.004700436256825924   │
│ Misc/B                        │ -20920550.0            │
│ Misc/q                        │ 0.004770204424858093   │
│ Misc/r                        │ 4.382809856906533e-05  │
│ Misc/s                        │ 2.752273576334119e-05  │
│ Misc/Lambda_star              │ 0.48837560415267944    │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 493  Steps count? : 16, Cost: 473.0
For episode num 494  Steps count? : 33, Cost: 474.0
For episode num 495  Steps count? : 34, Cost: 475.0
For episode num 496  Steps count? : 35, Cost: 476.0
For episode num 497  Steps count? : 45, Cost: 477.0
For episode num 498  Steps count? : 46, Cost: 478.0
For episode num 499  Steps count? : 41, Cost: 479.0
For episode num 500  Steps count? : 44, Cost: 480.0
For episode num 501  Steps count? : 69, Cost: 481.0
For episode num 502  Steps count? : 32, Cost: 482.0
For episode num 503  Steps count? : 38, Cost: 483.0
For episode num 504  Steps count? : 36, Cost: 484.0
For episode num 505  Steps count? : 35, Cost: 485.0
For episode num 506  Steps count? : 39, Cost: 486.0
For episode num 507  Steps count? : 30, Cost: 487.0
For episode num 508  Steps count? : 36, Cost: 488.0
For episode num 509  Steps count? : 33, Cost: 489.0
For episode num 510  Steps count? : 33, Cost: 490.0
For episode num 511  Steps count? : 32, Cost: 491.0
For episode num 512  Steps count? : 44, Cost: 492.0
For episode num 513  Steps count? : 53, Cost: 493.0
For episode num 514  Steps count? : 30, Cost: 494.0
For episode num 515  Steps count? : 30, Cost: 495.0
For episode num 516  Steps count? : 33, Cost: 496.0
For episode num 517  Steps count? : 44, Cost: 497.0
For episode num 518  Steps count? : 40, Cost: 498.0
For episode num 519  Steps count? : 37, Cost: 499.0
For episode num 520  Steps count? : 43, Cost: 500.0
For episode num 521  Steps count? : 37, Cost: 501.0
For episode num 522  Steps count? : 41, Cost: 502.0
For episode num 523  Steps count? : 28, Cost: 503.0
For episode num 524  Steps count? : 35, Cost: 504.0
For episode num 525  Steps count? : 30, Cost: 505.0
For episode num 526  Steps count? : 55, Cost: 506.0
For episode num 527  Steps count? : 34, Cost: 507.0
For episode num 528  Steps count? : 35, Cost: 508.0
For episode num 529  Steps count? : 39, Cost: 509.0
For episode num 530  Steps count? : 46, Cost: 510.0
For episode num 531  Steps count? : 43, Cost: 511.0
For episode num 532  Steps count? : 33, Cost: 512.0
For episode num 533  Steps count? : 33, Cost: 513.0
For episode num 534  Steps count? : 36, Cost: 514.0
For episode num 535  Steps count? : 37, Cost: 515.0
For episode num 536  Steps count? : 38, Cost: 516.0
For episode num 537  Steps count? : 36, Cost: 517.0
For episode num 538  Steps count? : 32, Cost: 518.0
For episode num 539  Steps count? : 30, Cost: 519.0
For episode num 540  Steps count? : 41, Cost: 520.0
For episode num 541  Steps count? : 59, Cost: 521.0
For episode num 542  Steps count? : 45, Cost: 522.0
For episode num 543  Steps count? : 53, Cost: 523.0
For episode num 544  Steps count? : 47, Cost: 524.0
Warning: trajectory cut off when rollout by epoch at 12.0 steps.
Processing rollout for epoch: 11... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.01467130146920681 Actual: 0.015114713460206985
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.914729595184326     │
│ Metrics/EpCost                │ 1.0                    │
│ Metrics/EpLen                 │ 39.099998474121094     │
│ Train/Epoch                   │ 11.0                   │
│ Train/Entropy                 │ 1.3217576742172241     │
│ Train/KL                      │ 0.00021411145280580968 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.0029730796813965     │
│ Train/PolicyRatio/Min         │ 1.0029730796813965     │
│ Train/PolicyRatio/Max         │ 1.0029730796813965     │
│ Train/PolicyRatio/Std         │ 0.002102284925058484   │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.9074016213417053     │
│ TotalEnvSteps                 │ 24000.0                │
│ Loss/Loss_pi                  │ -0.011259881779551506  │
│ Loss/Loss_pi/Delta            │ -0.005510188639163971  │
│ Value/Adv                     │ -4.291534239087014e-09 │
│ Loss/Loss_reward_critic       │ 0.08091804385185242    │
│ Loss/Loss_reward_critic/Delta │ -0.004535667598247528  │
│ Value/reward                  │ -3.3698489665985107    │
│ Loss/Loss_cost_critic         │ 0.009315989911556244   │
│ Loss/Loss_cost_critic/Delta   │ -0.0008899150416254997 │
│ Value/cost                    │ 0.8073142766952515     │
│ Time/Total                    │ 32.373661041259766     │
│ Time/Rollout                  │ 1.5728428363800049     │
│ Time/Update                   │ 1.1261556148529053     │
│ Time/Epoch                    │ 2.699018955230713      │
│ Time/FPS                      │ 741.01025390625        │
│ Misc/Alpha                    │ 1.3653783798217773     │
│ Misc/FinalStepNorm            │ 0.2293156385421753     │
│ Misc/gradient_norm            │ 0.23768194019794464    │
│ Misc/xHx                      │ 0.010728118009865284   │
│ Misc/H_inv_g                  │ 0.16795024275779724    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.013767590746283531   │
│ Misc/A                        │ 0.009388748556375504   │
│ Misc/B                        │ -18165060.0            │
│ Misc/q                        │ 0.010728118009865284   │
│ Misc/r                        │ 0.0002060833794530481  │
│ Misc/s                        │ 3.169922638335265e-05  │
│ Misc/Lambda_star              │ 0.732397735118866      │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 545  Steps count? : 12, Cost: 524.0
For episode num 546  Steps count? : 47, Cost: 525.0
For episode num 547  Steps count? : 33, Cost: 526.0
For episode num 548  Steps count? : 41, Cost: 527.0
For episode num 549  Steps count? : 41, Cost: 528.0
For episode num 550  Steps count? : 39, Cost: 529.0
For episode num 551  Steps count? : 52, Cost: 530.0
For episode num 552  Steps count? : 34, Cost: 531.0
For episode num 553  Steps count? : 29, Cost: 532.0
For episode num 554  Steps count? : 42, Cost: 533.0
For episode num 555  Steps count? : 34, Cost: 534.0
For episode num 556  Steps count? : 43, Cost: 535.0
For episode num 557  Steps count? : 53, Cost: 536.0
For episode num 558  Steps count? : 40, Cost: 537.0
For episode num 559  Steps count? : 57, Cost: 538.0
For episode num 560  Steps count? : 37, Cost: 539.0
For episode num 561  Steps count? : 36, Cost: 540.0
For episode num 562  Steps count? : 32, Cost: 541.0
For episode num 563  Steps count? : 28, Cost: 542.0
For episode num 564  Steps count? : 71, Cost: 543.0
For episode num 565  Steps count? : 29, Cost: 544.0
For episode num 566  Steps count? : 27, Cost: 545.0
For episode num 567  Steps count? : 43, Cost: 546.0
For episode num 568  Steps count? : 41, Cost: 547.0
For episode num 569  Steps count? : 33, Cost: 548.0
For episode num 570  Steps count? : 34, Cost: 549.0
For episode num 571  Steps count? : 51, Cost: 550.0
For episode num 572  Steps count? : 30, Cost: 551.0
For episode num 573  Steps count? : 33, Cost: 552.0
For episode num 574  Steps count? : 52, Cost: 553.0
For episode num 575  Steps count? : 49, Cost: 554.0
For episode num 576  Steps count? : 39, Cost: 555.0
For episode num 577  Steps count? : 32, Cost: 556.0
For episode num 578  Steps count? : 34, Cost: 557.0
For episode num 579  Steps count? : 47, Cost: 558.0
For episode num 580  Steps count? : 34, Cost: 559.0
For episode num 581  Steps count? : 44, Cost: 560.0
For episode num 582  Steps count? : 40, Cost: 561.0
For episode num 583  Steps count? : 30, Cost: 562.0
For episode num 584  Steps count? : 36, Cost: 563.0
For episode num 585  Steps count? : 42, Cost: 564.0
For episode num 586  Steps count? : 37, Cost: 565.0
For episode num 587  Steps count? : 35, Cost: 566.0
For episode num 588  Steps count? : 51, Cost: 567.0
For episode num 589  Steps count? : 31, Cost: 568.0
For episode num 590  Steps count? : 39, Cost: 569.0
For episode num 591  Steps count? : 34, Cost: 570.0
For episode num 592  Steps count? : 36, Cost: 571.0
For episode num 593  Steps count? : 33, Cost: 572.0
For episode num 594  Steps count? : 27, Cost: 573.0
For episode num 595  Steps count? : 53, Cost: 574.0
For episode num 596  Steps count? : 30, Cost: 575.0
Warning: trajectory cut off when rollout by epoch at 5.0 steps.
Processing rollout for epoch: 12... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.009068886749446392 Actual: 0.010382887907326221
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.924453258514404     │
│ Metrics/EpCost                │ 1.0                    │
│ Metrics/EpLen                 │ 38.959999084472656     │
│ Train/Epoch                   │ 12.0                   │
│ Train/Entropy                 │ 1.295592188835144      │
│ Train/KL                      │ 0.0002481280534993857  │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.0009762048721313     │
│ Train/PolicyRatio/Min         │ 1.0009762048721313     │
│ Train/PolicyRatio/Max         │ 1.0009762048721313     │
│ Train/PolicyRatio/Std         │ 0.0006903091561980546  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.8840739130973816     │
│ TotalEnvSteps                 │ 26000.0                │
│ Loss/Loss_pi                  │ -0.007738716900348663  │
│ Loss/Loss_pi/Delta            │ 0.0035211648792028427  │
│ Value/Adv                     │ 1.1920929132713809e-08 │
│ Loss/Loss_reward_critic       │ 0.07812344282865524    │
│ Loss/Loss_reward_critic/Delta │ -0.002794601023197174  │
│ Value/reward                  │ -3.3375158309936523    │
│ Loss/Loss_cost_critic         │ 0.00849883258342743    │
│ Loss/Loss_cost_critic/Delta   │ -0.0008171573281288147 │
│ Value/cost                    │ 0.811548113822937      │
│ Time/Total                    │ 35.76102066040039      │
│ Time/Rollout                  │ 2.506401538848877      │
│ Time/Update                   │ 0.8574929237365723     │
│ Time/Epoch                    │ 3.363920211791992      │
│ Time/FPS                      │ 594.5445556640625      │
│ Misc/Alpha                    │ 2.2120845317840576     │
│ Misc/FinalStepNorm            │ 0.17913609743118286    │
│ Misc/gradient_norm            │ 0.08360373228788376    │
│ Misc/xHx                      │ 0.004087196663022041   │
│ Misc/H_inv_g                  │ 0.08098066598176956    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.018114006146788597   │
│ Misc/A                        │ 0.003839364042505622   │
│ Misc/B                        │ -20567644.0            │
│ Misc/q                        │ 0.004087196663022041   │
│ Misc/r                        │ 8.331019489560276e-05  │
│ Misc/s                        │ 2.7995152777293697e-05 │
│ Misc/Lambda_star              │ 0.4520622789859772     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 597  Steps count? : 5, Cost: 575.0
For episode num 598  Steps count? : 27, Cost: 576.0
For episode num 599  Steps count? : 33, Cost: 577.0
For episode num 600  Steps count? : 34, Cost: 578.0
For episode num 601  Steps count? : 36, Cost: 579.0
For episode num 602  Steps count? : 40, Cost: 580.0
For episode num 603  Steps count? : 29, Cost: 581.0
For episode num 604  Steps count? : 46, Cost: 582.0
For episode num 605  Steps count? : 33, Cost: 583.0
For episode num 606  Steps count? : 35, Cost: 584.0
For episode num 607  Steps count? : 28, Cost: 585.0
For episode num 608  Steps count? : 53, Cost: 586.0
For episode num 609  Steps count? : 30, Cost: 587.0
For episode num 610  Steps count? : 35, Cost: 588.0
For episode num 611  Steps count? : 32, Cost: 589.0
For episode num 612  Steps count? : 37, Cost: 590.0
For episode num 613  Steps count? : 35, Cost: 591.0
For episode num 614  Steps count? : 37, Cost: 592.0
For episode num 615  Steps count? : 30, Cost: 593.0
For episode num 616  Steps count? : 39, Cost: 594.0
For episode num 617  Steps count? : 38, Cost: 595.0
For episode num 618  Steps count? : 29, Cost: 596.0
For episode num 619  Steps count? : 37, Cost: 597.0
For episode num 620  Steps count? : 66, Cost: 598.0
For episode num 621  Steps count? : 32, Cost: 599.0
For episode num 622  Steps count? : 47, Cost: 600.0
For episode num 623  Steps count? : 34, Cost: 601.0
For episode num 624  Steps count? : 63, Cost: 602.0
For episode num 625  Steps count? : 44, Cost: 603.0
For episode num 626  Steps count? : 37, Cost: 604.0
For episode num 627  Steps count? : 34, Cost: 605.0
For episode num 628  Steps count? : 35, Cost: 606.0
For episode num 629  Steps count? : 37, Cost: 607.0
For episode num 630  Steps count? : 32, Cost: 608.0
For episode num 631  Steps count? : 38, Cost: 609.0
For episode num 632  Steps count? : 34, Cost: 610.0
For episode num 633  Steps count? : 45, Cost: 611.0
For episode num 634  Steps count? : 36, Cost: 612.0
For episode num 635  Steps count? : 42, Cost: 613.0
For episode num 636  Steps count? : 27, Cost: 614.0
For episode num 637  Steps count? : 27, Cost: 615.0
For episode num 638  Steps count? : 38, Cost: 616.0
For episode num 639  Steps count? : 31, Cost: 617.0
For episode num 640  Steps count? : 34, Cost: 618.0
For episode num 641  Steps count? : 30, Cost: 619.0
For episode num 642  Steps count? : 42, Cost: 620.0
For episode num 643  Steps count? : 39, Cost: 621.0
For episode num 644  Steps count? : 36, Cost: 622.0
For episode num 645  Steps count? : 33, Cost: 623.0
For episode num 646  Steps count? : 31, Cost: 624.0
For episode num 647  Steps count? : 32, Cost: 625.0
For episode num 648  Steps count? : 38, Cost: 626.0
For episode num 649  Steps count? : 30, Cost: 627.0
For episode num 650  Steps count? : 30, Cost: 628.0
For episode num 651  Steps count? : 36, Cost: 629.0
For episode num 652  Steps count? : 37, Cost: 630.0
Processing rollout for epoch: 13... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.018524736166000366 Actual: 0.019026312977075577
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.79096794128418      │
│ Metrics/EpCost                │ 1.0                    │
│ Metrics/EpLen                 │ 36.599998474121094     │
│ Train/Epoch                   │ 13.0                   │
│ Train/Entropy                 │ 1.2900446653366089     │
│ Train/KL                      │ 0.0002385948464507237  │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.0018852949142456     │
│ Train/PolicyRatio/Min         │ 1.0018852949142456     │
│ Train/PolicyRatio/Max         │ 1.0018852949142456     │
│ Train/PolicyRatio/Std         │ 0.0013330767396837473  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.8790749907493591     │
│ TotalEnvSteps                 │ 28000.0                │
│ Loss/Loss_pi                  │ -0.014216573908925056  │
│ Loss/Loss_pi/Delta            │ -0.006477857008576393  │
│ Value/Adv                     │ 2.6226043559063328e-09 │
│ Loss/Loss_reward_critic       │ 0.0714113712310791     │
│ Loss/Loss_reward_critic/Delta │ -0.006712071597576141  │
│ Value/reward                  │ -3.2844161987304688    │
│ Loss/Loss_cost_critic         │ 0.007395645137876272   │
│ Loss/Loss_cost_critic/Delta   │ -0.001103187445551157  │
│ Value/cost                    │ 0.8208706378936768     │
│ Time/Total                    │ 38.501590728759766     │
│ Time/Rollout                  │ 1.6211354732513428     │
│ Time/Update                   │ 1.0998382568359375     │
│ Time/Epoch                    │ 2.721008777618408      │
│ Time/FPS                      │ 735.0222778320312      │
│ Misc/Alpha                    │ 1.081207036972046      │
│ Misc/FinalStepNorm            │ 0.19989992678165436    │
│ Misc/gradient_norm            │ 0.26626893877983093    │
│ Misc/xHx                      │ 0.017108501866459846   │
│ Misc/H_inv_g                  │ 0.1848858743906021     │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.010702582076191902   │
│ Misc/A                        │ 0.014461608603596687   │
│ Misc/B                        │ -30928284.0            │
│ Misc/q                        │ 0.017108501866459846   │
│ Misc/r                        │ 0.0002220248570665717  │
│ Misc/s                        │ 1.861372948042117e-05  │
│ Misc/Lambda_star              │ 0.924892246723175      │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 653  Steps count? : 0, Cost: 630.0
For episode num 654  Steps count? : 44, Cost: 631.0
For episode num 655  Steps count? : 38, Cost: 632.0
For episode num 656  Steps count? : 39, Cost: 633.0
For episode num 657  Steps count? : 31, Cost: 634.0
For episode num 658  Steps count? : 35, Cost: 635.0
For episode num 659  Steps count? : 36, Cost: 636.0
For episode num 660  Steps count? : 34, Cost: 637.0
For episode num 661  Steps count? : 31, Cost: 638.0
For episode num 662  Steps count? : 33, Cost: 639.0
For episode num 663  Steps count? : 50, Cost: 640.0
For episode num 664  Steps count? : 40, Cost: 641.0
For episode num 665  Steps count? : 40, Cost: 642.0
For episode num 666  Steps count? : 39, Cost: 643.0
For episode num 667  Steps count? : 32, Cost: 644.0
For episode num 668  Steps count? : 31, Cost: 645.0
For episode num 669  Steps count? : 37, Cost: 646.0
For episode num 670  Steps count? : 48, Cost: 647.0
For episode num 671  Steps count? : 37, Cost: 648.0
For episode num 672  Steps count? : 31, Cost: 649.0
For episode num 673  Steps count? : 29, Cost: 650.0
For episode num 674  Steps count? : 37, Cost: 651.0
For episode num 675  Steps count? : 53, Cost: 652.0
For episode num 676  Steps count? : 33, Cost: 653.0
For episode num 677  Steps count? : 36, Cost: 654.0
For episode num 678  Steps count? : 39, Cost: 655.0
For episode num 679  Steps count? : 30, Cost: 656.0
For episode num 680  Steps count? : 38, Cost: 657.0
For episode num 681  Steps count? : 33, Cost: 658.0
For episode num 682  Steps count? : 33, Cost: 659.0
For episode num 683  Steps count? : 31, Cost: 660.0
For episode num 684  Steps count? : 32, Cost: 661.0
For episode num 685  Steps count? : 34, Cost: 662.0
For episode num 686  Steps count? : 42, Cost: 663.0
For episode num 687  Steps count? : 30, Cost: 664.0
For episode num 688  Steps count? : 27, Cost: 665.0
For episode num 689  Steps count? : 27, Cost: 666.0
For episode num 690  Steps count? : 35, Cost: 667.0
For episode num 691  Steps count? : 37, Cost: 668.0
For episode num 692  Steps count? : 28, Cost: 669.0
For episode num 693  Steps count? : 30, Cost: 670.0
For episode num 694  Steps count? : 59, Cost: 671.0
For episode num 695  Steps count? : 36, Cost: 672.0
For episode num 696  Steps count? : 40, Cost: 673.0
For episode num 697  Steps count? : 40, Cost: 674.0
For episode num 698  Steps count? : 27, Cost: 675.0
For episode num 699  Steps count? : 33, Cost: 676.0
For episode num 700  Steps count? : 33, Cost: 677.0
For episode num 701  Steps count? : 32, Cost: 678.0
For episode num 702  Steps count? : 46, Cost: 679.0
For episode num 703  Steps count? : 32, Cost: 680.0
For episode num 704  Steps count? : 31, Cost: 681.0
For episode num 705  Steps count? : 59, Cost: 682.0
For episode num 706  Steps count? : 34, Cost: 683.0
For episode num 707  Steps count? : 32, Cost: 684.0
For episode num 708  Steps count? : 35, Cost: 685.0
Warning: trajectory cut off when rollout by epoch at 11.0 steps.
Processing rollout for epoch: 14... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.012793579138815403 Actual: 0.009914597496390343
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.756315231323242      │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 36.040000915527344      │
│ Train/Epoch                   │ 14.0                    │
│ Train/Entropy                 │ 1.3079723119735718      │
│ Train/KL                      │ 0.0002373262686887756   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0004667043685913      │
│ Train/PolicyRatio/Min         │ 1.0004667043685913      │
│ Train/PolicyRatio/Max         │ 1.0004667043685913      │
│ Train/PolicyRatio/Std         │ 0.000329981732647866    │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.8950192332267761      │
│ TotalEnvSteps                 │ 30000.0                 │
│ Loss/Loss_pi                  │ -0.007391023449599743   │
│ Loss/Loss_pi/Delta            │ 0.006825550459325314    │
│ Value/Adv                     │ -1.4305114870438729e-09 │
│ Loss/Loss_reward_critic       │ 0.06727777421474457     │
│ Loss/Loss_reward_critic/Delta │ -0.004133597016334534   │
│ Value/reward                  │ -3.28326416015625       │
│ Loss/Loss_cost_critic         │ 0.0065854452550411224   │
│ Loss/Loss_cost_critic/Delta   │ -0.0008101998828351498  │
│ Value/cost                    │ 0.8263195157051086      │
│ Time/Total                    │ 41.8160514831543        │
│ Time/Rollout                  │ 2.434854507446289       │
│ Time/Update                   │ 0.8552067279815674      │
│ Time/Epoch                    │ 3.2900784015655518      │
│ Time/FPS                      │ 607.8883666992188       │
│ Misc/Alpha                    │ 1.5661431550979614      │
│ Misc/FinalStepNorm            │ 0.19977480173110962     │
│ Misc/gradient_norm            │ 0.20338068902492523     │
│ Misc/xHx                      │ 0.008153920993208885    │
│ Misc/H_inv_g                  │ 0.12755846977233887     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.011382005177438259    │
│ Misc/A                        │ 0.00697674136608839     │
│ Misc/B                        │ -30159372.0             │
│ Misc/q                        │ 0.008153920993208885    │
│ Misc/r                        │ 0.00014994136290624738  │
│ Misc/s                        │ 1.9088540284428746e-05  │
│ Misc/Lambda_star              │ 0.6385112404823303      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 709  Steps count? : 11, Cost: 685.0
For episode num 710  Steps count? : 32, Cost: 686.0
For episode num 711  Steps count? : 34, Cost: 687.0
For episode num 712  Steps count? : 34, Cost: 688.0
For episode num 713  Steps count? : 34, Cost: 689.0
For episode num 714  Steps count? : 37, Cost: 690.0
For episode num 715  Steps count? : 30, Cost: 691.0
For episode num 716  Steps count? : 38, Cost: 692.0
For episode num 717  Steps count? : 42, Cost: 693.0
For episode num 718  Steps count? : 46, Cost: 694.0
For episode num 719  Steps count? : 34, Cost: 695.0
For episode num 720  Steps count? : 36, Cost: 696.0
For episode num 721  Steps count? : 29, Cost: 697.0
For episode num 722  Steps count? : 36, Cost: 698.0
For episode num 723  Steps count? : 62, Cost: 699.0
For episode num 724  Steps count? : 45, Cost: 700.0
For episode num 725  Steps count? : 31, Cost: 701.0
For episode num 726  Steps count? : 37, Cost: 702.0
For episode num 727  Steps count? : 43, Cost: 703.0
For episode num 728  Steps count? : 38, Cost: 704.0
For episode num 729  Steps count? : 30, Cost: 705.0
For episode num 730  Steps count? : 37, Cost: 706.0
For episode num 731  Steps count? : 46, Cost: 707.0
For episode num 732  Steps count? : 33, Cost: 708.0
For episode num 733  Steps count? : 30, Cost: 709.0
For episode num 734  Steps count? : 30, Cost: 710.0
For episode num 735  Steps count? : 47, Cost: 711.0
For episode num 736  Steps count? : 41, Cost: 712.0
For episode num 737  Steps count? : 29, Cost: 713.0
For episode num 738  Steps count? : 32, Cost: 714.0
For episode num 739  Steps count? : 51, Cost: 715.0
For episode num 740  Steps count? : 30, Cost: 716.0
For episode num 741  Steps count? : 29, Cost: 717.0
For episode num 742  Steps count? : 48, Cost: 718.0
For episode num 743  Steps count? : 43, Cost: 719.0
For episode num 744  Steps count? : 37, Cost: 720.0
For episode num 745  Steps count? : 57, Cost: 721.0
For episode num 746  Steps count? : 34, Cost: 722.0
For episode num 747  Steps count? : 52, Cost: 723.0
For episode num 748  Steps count? : 35, Cost: 724.0
For episode num 749  Steps count? : 41, Cost: 725.0
For episode num 750  Steps count? : 30, Cost: 726.0
For episode num 751  Steps count? : 35, Cost: 727.0
For episode num 752  Steps count? : 34, Cost: 728.0
For episode num 753  Steps count? : 43, Cost: 729.0
For episode num 754  Steps count? : 36, Cost: 730.0
For episode num 755  Steps count? : 36, Cost: 731.0
For episode num 756  Steps count? : 32, Cost: 732.0
For episode num 757  Steps count? : 36, Cost: 733.0
For episode num 758  Steps count? : 28, Cost: 734.0
For episode num 759  Steps count? : 37, Cost: 735.0
For episode num 760  Steps count? : 39, Cost: 736.0
For episode num 761  Steps count? : 60, Cost: 737.0
Warning: trajectory cut off when rollout by epoch at 24.0 steps.
Processing rollout for epoch: 15... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.0069122277200222015 Actual: 0.006170670036226511
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.802411079406738      │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 38.20000076293945       │
│ Train/Epoch                   │ 15.0                    │
│ Train/Entropy                 │ 1.2777572870254517      │
│ Train/KL                      │ 0.0002514599182177335   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9988425374031067      │
│ Train/PolicyRatio/Min         │ 0.9988425374031067      │
│ Train/PolicyRatio/Max         │ 0.9988425374031067      │
│ Train/PolicyRatio/Std         │ 0.0008184356265701354   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.8686425685882568      │
│ TotalEnvSteps                 │ 32000.0                 │
│ Loss/Loss_pi                  │ -0.004728641360998154   │
│ Loss/Loss_pi/Delta            │ 0.002662382088601589    │
│ Value/Adv                     │ -5.7220459481754915e-09 │
│ Loss/Loss_reward_critic       │ 0.06833190470933914     │
│ Loss/Loss_reward_critic/Delta │ 0.001054130494594574    │
│ Value/reward                  │ -3.278634786605835      │
│ Loss/Loss_cost_critic         │ 0.006244814954698086    │
│ Loss/Loss_cost_critic/Delta   │ -0.00034063030034303665 │
│ Value/cost                    │ 0.8180252313613892      │
│ Time/Total                    │ 44.28607940673828       │
│ Time/Rollout                  │ 1.610625982284546       │
│ Time/Update                   │ 0.841240644454956       │
│ Time/Epoch                    │ 2.451885223388672       │
│ Time/FPS                      │ 815.6995849609375       │
│ Misc/Alpha                    │ 2.8996598720550537      │
│ Misc/FinalStepNorm            │ 0.16933919489383698     │
│ Misc/gradient_norm            │ 0.12651169300079346     │
│ Misc/xHx                      │ 0.00237866910174489     │
│ Misc/H_inv_g                  │ 0.058399684727191925    │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.01981290802359581     │
│ Misc/A                        │ 0.0014861864037811756   │
│ Misc/B                        │ -22167548.0             │
│ Misc/q                        │ 0.00237866910174489     │
│ Misc/r                        │ -0.00015228331903927028 │
│ Misc/s                        │ 2.5973928131861612e-05  │
│ Misc/Lambda_star              │ 0.3448680341243744      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 762  Steps count? : 24, Cost: 737.0
For episode num 763  Steps count? : 37, Cost: 738.0
For episode num 764  Steps count? : 33, Cost: 739.0
For episode num 765  Steps count? : 34, Cost: 740.0
For episode num 766  Steps count? : 30, Cost: 741.0
For episode num 767  Steps count? : 42, Cost: 742.0
For episode num 768  Steps count? : 57, Cost: 743.0
For episode num 769  Steps count? : 31, Cost: 744.0
For episode num 770  Steps count? : 48, Cost: 745.0
For episode num 771  Steps count? : 56, Cost: 746.0
For episode num 772  Steps count? : 30, Cost: 747.0
For episode num 773  Steps count? : 35, Cost: 748.0
For episode num 774  Steps count? : 35, Cost: 749.0
For episode num 775  Steps count? : 36, Cost: 750.0
For episode num 776  Steps count? : 33, Cost: 751.0
For episode num 777  Steps count? : 86, Cost: 752.0
For episode num 778  Steps count? : 30, Cost: 753.0
For episode num 779  Steps count? : 34, Cost: 754.0
For episode num 780  Steps count? : 31, Cost: 755.0
For episode num 781  Steps count? : 31, Cost: 756.0
For episode num 782  Steps count? : 32, Cost: 757.0
For episode num 783  Steps count? : 32, Cost: 758.0
For episode num 784  Steps count? : 40, Cost: 759.0
For episode num 785  Steps count? : 33, Cost: 760.0
For episode num 786  Steps count? : 41, Cost: 761.0
For episode num 787  Steps count? : 37, Cost: 762.0
For episode num 788  Steps count? : 32, Cost: 763.0
For episode num 789  Steps count? : 43, Cost: 764.0
For episode num 790  Steps count? : 36, Cost: 765.0
For episode num 791  Steps count? : 51, Cost: 766.0
For episode num 792  Steps count? : 30, Cost: 767.0
For episode num 793  Steps count? : 31, Cost: 768.0
For episode num 794  Steps count? : 33, Cost: 769.0
For episode num 795  Steps count? : 40, Cost: 770.0
For episode num 796  Steps count? : 49, Cost: 771.0
For episode num 797  Steps count? : 37, Cost: 772.0
For episode num 798  Steps count? : 33, Cost: 773.0
For episode num 799  Steps count? : 34, Cost: 774.0
For episode num 800  Steps count? : 38, Cost: 775.0
For episode num 801  Steps count? : 33, Cost: 776.0
For episode num 802  Steps count? : 45, Cost: 777.0
For episode num 803  Steps count? : 35, Cost: 778.0
For episode num 804  Steps count? : 30, Cost: 779.0
For episode num 805  Steps count? : 34, Cost: 780.0
For episode num 806  Steps count? : 32, Cost: 781.0
For episode num 807  Steps count? : 43, Cost: 782.0
For episode num 808  Steps count? : 54, Cost: 783.0
For episode num 809  Steps count? : 44, Cost: 784.0
For episode num 810  Steps count? : 35, Cost: 785.0
For episode num 811  Steps count? : 37, Cost: 786.0
For episode num 812  Steps count? : 35, Cost: 787.0
For episode num 813  Steps count? : 31, Cost: 788.0
For episode num 814  Steps count? : 37, Cost: 789.0
Warning: trajectory cut off when rollout by epoch at 24.0 steps.
Processing rollout for epoch: 16... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.008562150411307812 Actual: 0.008451025933027267
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.793794631958008     │
│ Metrics/EpCost                │ 1.0                    │
│ Metrics/EpLen                 │ 38.119998931884766     │
│ Train/Epoch                   │ 16.0                   │
│ Train/Entropy                 │ 1.219552993774414      │
│ Train/KL                      │ 0.00026315287686884403 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.0005741119384766     │
│ Train/PolicyRatio/Min         │ 1.0005741119384766     │
│ Train/PolicyRatio/Max         │ 1.0005741119384766     │
│ Train/PolicyRatio/Std         │ 0.00040601464570499957 │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.8195534348487854     │
│ TotalEnvSteps                 │ 34000.0                │
│ Loss/Loss_pi                  │ -0.006397371180355549  │
│ Loss/Loss_pi/Delta            │ -0.0016687298193573952 │
│ Value/Adv                     │ 1.3113021779531664e-08 │
│ Loss/Loss_reward_critic       │ 0.0709940642118454     │
│ Loss/Loss_reward_critic/Delta │ 0.002662159502506256   │
│ Value/reward                  │ -3.266788959503174     │
│ Loss/Loss_cost_critic         │ 0.005952821578830481   │
│ Loss/Loss_cost_critic/Delta   │ -0.0002919933758676052 │
│ Value/cost                    │ 0.8193222880363464     │
│ Time/Total                    │ 46.764625549316406     │
│ Time/Rollout                  │ 1.6083734035491943     │
│ Time/Update                   │ 0.8516018390655518     │
│ Time/Epoch                    │ 2.4599978923797607     │
│ Time/FPS                      │ 813.009033203125       │
│ Misc/Alpha                    │ 2.3388426303863525     │
│ Misc/FinalStepNorm            │ 0.20709991455078125    │
│ Misc/gradient_norm            │ 0.1559021919965744     │
│ Misc/xHx                      │ 0.003656173823401332   │
│ Misc/H_inv_g                  │ 0.08854802697896957    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.010684444569051266   │
│ Misc/A                        │ 0.003430659417062998   │
│ Misc/B                        │ -26549640.0            │
│ Misc/q                        │ 0.003656173823401332   │
│ Misc/r                        │ -6.994698196649551e-05 │
│ Misc/s                        │ 2.1685209503630176e-05 │
│ Misc/Lambda_star              │ 0.4275619089603424     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 815  Steps count? : 24, Cost: 789.0
For episode num 816  Steps count? : 39, Cost: 790.0
For episode num 817  Steps count? : 32, Cost: 791.0
For episode num 818  Steps count? : 40, Cost: 792.0
For episode num 819  Steps count? : 47, Cost: 793.0
For episode num 820  Steps count? : 50, Cost: 794.0
For episode num 821  Steps count? : 38, Cost: 795.0
For episode num 822  Steps count? : 33, Cost: 796.0
For episode num 823  Steps count? : 43, Cost: 797.0
For episode num 824  Steps count? : 44, Cost: 798.0
For episode num 825  Steps count? : 36, Cost: 799.0
For episode num 826  Steps count? : 33, Cost: 800.0
For episode num 827  Steps count? : 50, Cost: 801.0
For episode num 828  Steps count? : 59, Cost: 802.0
For episode num 829  Steps count? : 39, Cost: 803.0
For episode num 830  Steps count? : 36, Cost: 804.0
For episode num 831  Steps count? : 41, Cost: 805.0
For episode num 832  Steps count? : 32, Cost: 806.0
For episode num 833  Steps count? : 54, Cost: 807.0
For episode num 834  Steps count? : 37, Cost: 808.0
For episode num 835  Steps count? : 27, Cost: 809.0
For episode num 836  Steps count? : 50, Cost: 810.0
For episode num 837  Steps count? : 34, Cost: 811.0
For episode num 838  Steps count? : 40, Cost: 812.0
For episode num 839  Steps count? : 40, Cost: 813.0
For episode num 840  Steps count? : 46, Cost: 814.0
For episode num 841  Steps count? : 28, Cost: 815.0
For episode num 842  Steps count? : 37, Cost: 816.0
For episode num 843  Steps count? : 45, Cost: 817.0
For episode num 844  Steps count? : 35, Cost: 818.0
For episode num 845  Steps count? : 47, Cost: 819.0
For episode num 846  Steps count? : 34, Cost: 820.0
For episode num 847  Steps count? : 32, Cost: 821.0
For episode num 848  Steps count? : 39, Cost: 822.0
For episode num 849  Steps count? : 30, Cost: 823.0
For episode num 850  Steps count? : 38, Cost: 824.0
For episode num 851  Steps count? : 53, Cost: 825.0
For episode num 852  Steps count? : 38, Cost: 826.0
For episode num 853  Steps count? : 34, Cost: 827.0
For episode num 854  Steps count? : 48, Cost: 828.0
For episode num 855  Steps count? : 48, Cost: 829.0
For episode num 856  Steps count? : 34, Cost: 830.0
For episode num 857  Steps count? : 42, Cost: 831.0
For episode num 858  Steps count? : 33, Cost: 832.0
For episode num 859  Steps count? : 38, Cost: 833.0
For episode num 860  Steps count? : 31, Cost: 834.0
For episode num 861  Steps count? : 31, Cost: 835.0
For episode num 862  Steps count? : 38, Cost: 836.0
For episode num 863  Steps count? : 38, Cost: 837.0
For episode num 864  Steps count? : 36, Cost: 838.0
For episode num 865  Steps count? : 47, Cost: 839.0
Warning: trajectory cut off when rollout by epoch at 26.0 steps.
Processing rollout for epoch: 17... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.009119010530412197 Actual: 0.009153665043413639
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.85603141784668       │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 39.47999954223633       │
│ Train/Epoch                   │ 17.0                    │
│ Train/Entropy                 │ 1.1654678583145142      │
│ Train/KL                      │ 0.0002928902395069599   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0008617639541626      │
│ Train/PolicyRatio/Min         │ 1.0008617639541626      │
│ Train/PolicyRatio/Max         │ 1.0008617639541626      │
│ Train/PolicyRatio/Std         │ 0.0006094434065744281   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.7763336300849915      │
│ TotalEnvSteps                 │ 36000.0                 │
│ Loss/Loss_pi                  │ -0.006890838034451008   │
│ Loss/Loss_pi/Delta            │ -0.000493466854095459   │
│ Value/Adv                     │ 9.536743617033494e-10   │
│ Loss/Loss_reward_critic       │ 0.06833010911941528     │
│ Loss/Loss_reward_critic/Delta │ -0.0026639550924301147  │
│ Value/reward                  │ -3.2782273292541504     │
│ Loss/Loss_cost_critic         │ 0.005589940119534731    │
│ Loss/Loss_cost_critic/Delta   │ -0.00036288145929574966 │
│ Value/cost                    │ 0.8148655891418457      │
│ Time/Total                    │ 49.243988037109375      │
│ Time/Rollout                  │ 1.6183929443359375      │
│ Time/Update                   │ 0.8425436019897461      │
│ Time/Epoch                    │ 2.4609532356262207      │
│ Time/FPS                      │ 812.6934814453125       │
│ Misc/Alpha                    │ 2.199291944503784       │
│ Misc/FinalStepNorm            │ 0.19156678020954132     │
│ Misc/gradient_norm            │ 0.1228218749165535      │
│ Misc/xHx                      │ 0.00413488270714879     │
│ Misc/H_inv_g                  │ 0.08710385859012604     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.009169839322566986    │
│ Misc/A                        │ 0.004104868043214083    │
│ Misc/B                        │ -24231490.0             │
│ Misc/q                        │ 0.00413488270714879     │
│ Misc/r                        │ -2.6710795282269828e-05 │
│ Misc/s                        │ 2.3760721887811087e-05  │
│ Misc/Lambda_star              │ 0.4546917974948883      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 866  Steps count? : 26, Cost: 839.0
For episode num 867  Steps count? : 50, Cost: 840.0
For episode num 868  Steps count? : 50, Cost: 841.0
For episode num 869  Steps count? : 41, Cost: 842.0
For episode num 870  Steps count? : 37, Cost: 843.0
For episode num 871  Steps count? : 35, Cost: 844.0
For episode num 872  Steps count? : 42, Cost: 845.0
For episode num 873  Steps count? : 37, Cost: 846.0
For episode num 874  Steps count? : 33, Cost: 847.0
For episode num 875  Steps count? : 40, Cost: 848.0
For episode num 876  Steps count? : 35, Cost: 849.0
For episode num 877  Steps count? : 32, Cost: 850.0
For episode num 878  Steps count? : 38, Cost: 851.0
For episode num 879  Steps count? : 42, Cost: 852.0
For episode num 880  Steps count? : 37, Cost: 853.0
For episode num 881  Steps count? : 34, Cost: 854.0
For episode num 882  Steps count? : 40, Cost: 855.0
For episode num 883  Steps count? : 44, Cost: 856.0
For episode num 884  Steps count? : 40, Cost: 857.0
For episode num 885  Steps count? : 38, Cost: 858.0
For episode num 886  Steps count? : 69, Cost: 859.0
For episode num 887  Steps count? : 48, Cost: 860.0
For episode num 888  Steps count? : 42, Cost: 861.0
For episode num 889  Steps count? : 41, Cost: 862.0
For episode num 890  Steps count? : 40, Cost: 863.0
For episode num 891  Steps count? : 35, Cost: 864.0
For episode num 892  Steps count? : 34, Cost: 865.0
For episode num 893  Steps count? : 41, Cost: 866.0
For episode num 894  Steps count? : 37, Cost: 867.0
For episode num 895  Steps count? : 42, Cost: 868.0
For episode num 896  Steps count? : 37, Cost: 869.0
For episode num 897  Steps count? : 35, Cost: 870.0
For episode num 898  Steps count? : 65, Cost: 871.0
For episode num 899  Steps count? : 30, Cost: 872.0
For episode num 900  Steps count? : 33, Cost: 873.0
For episode num 901  Steps count? : 45, Cost: 874.0
For episode num 902  Steps count? : 65, Cost: 875.0
For episode num 903  Steps count? : 32, Cost: 876.0
For episode num 904  Steps count? : 38, Cost: 877.0
For episode num 905  Steps count? : 31, Cost: 878.0
For episode num 906  Steps count? : 29, Cost: 879.0
For episode num 907  Steps count? : 33, Cost: 880.0
For episode num 908  Steps count? : 40, Cost: 881.0
For episode num 909  Steps count? : 41, Cost: 882.0
For episode num 910  Steps count? : 39, Cost: 883.0
For episode num 911  Steps count? : 47, Cost: 884.0
For episode num 912  Steps count? : 31, Cost: 885.0
For episode num 913  Steps count? : 40, Cost: 886.0
For episode num 914  Steps count? : 36, Cost: 887.0
For episode num 915  Steps count? : 32, Cost: 888.0
Warning: trajectory cut off when rollout by epoch at 47.0 steps.
Processing rollout for epoch: 18... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.004276953637599945 Actual: 0.004695894196629524
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.81874418258667       │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 40.0                    │
│ Train/Epoch                   │ 18.0                    │
│ Train/Entropy                 │ 1.1624784469604492      │
│ Train/KL                      │ 0.00019421862089075148  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9988446235656738      │
│ Train/PolicyRatio/Min         │ 0.9988446235656738      │
│ Train/PolicyRatio/Max         │ 0.9988446235656738      │
│ Train/PolicyRatio/Std         │ 0.0008169464417733252   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.7738249897956848      │
│ TotalEnvSteps                 │ 38000.0                 │
│ Loss/Loss_pi                  │ -0.003528565401211381   │
│ Loss/Loss_pi/Delta            │ 0.003362272633239627    │
│ Value/Adv                     │ -9.536743617033494e-10  │
│ Loss/Loss_reward_critic       │ 0.06587900966405869     │
│ Loss/Loss_reward_critic/Delta │ -0.002451099455356598   │
│ Value/reward                  │ -3.308528184890747      │
│ Loss/Loss_cost_critic         │ 0.005372559651732445    │
│ Loss/Loss_cost_critic/Delta   │ -0.00021738046780228615 │
│ Value/cost                    │ 0.8153409361839294      │
│ Time/Total                    │ 51.72027587890625       │
│ Time/Rollout                  │ 1.5998566150665283      │
│ Time/Update                   │ 0.8577487468719482      │
│ Time/Epoch                    │ 2.4576215744018555      │
│ Time/FPS                      │ 813.795166015625        │
│ Misc/Alpha                    │ 4.711889743804932       │
│ Misc/FinalStepNorm            │ 0.2811991274356842      │
│ Misc/gradient_norm            │ 0.07931436598300934     │
│ Misc/xHx                      │ 0.0009008136112242937   │
│ Misc/H_inv_g                  │ 0.05967862904071808     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.009525967761874199    │
│ Misc/A                        │ 0.0008660780731588602   │
│ Misc/B                        │ -98253000.0             │
│ Misc/q                        │ 0.0009008136112242937   │
│ Misc/r                        │ -1.4270045539888088e-05 │
│ Misc/s                        │ 5.852416506968439e-06   │
│ Misc/Lambda_star              │ 0.21222907304763794     │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 916  Steps count? : 47, Cost: 888.0
For episode num 917  Steps count? : 32, Cost: 889.0
For episode num 918  Steps count? : 41, Cost: 890.0
For episode num 919  Steps count? : 35, Cost: 891.0
For episode num 920  Steps count? : 32, Cost: 892.0
For episode num 921  Steps count? : 40, Cost: 893.0
For episode num 922  Steps count? : 37, Cost: 894.0
For episode num 923  Steps count? : 37, Cost: 895.0
For episode num 924  Steps count? : 36, Cost: 896.0
For episode num 925  Steps count? : 31, Cost: 897.0
For episode num 926  Steps count? : 36, Cost: 898.0
For episode num 927  Steps count? : 29, Cost: 899.0
For episode num 928  Steps count? : 32, Cost: 900.0
For episode num 929  Steps count? : 51, Cost: 901.0
For episode num 930  Steps count? : 39, Cost: 902.0
For episode num 931  Steps count? : 41, Cost: 903.0
For episode num 932  Steps count? : 39, Cost: 904.0
For episode num 933  Steps count? : 37, Cost: 905.0
For episode num 934  Steps count? : 45, Cost: 906.0
For episode num 935  Steps count? : 44, Cost: 907.0
For episode num 936  Steps count? : 41, Cost: 908.0
For episode num 937  Steps count? : 31, Cost: 909.0
For episode num 938  Steps count? : 39, Cost: 910.0
For episode num 939  Steps count? : 41, Cost: 911.0
For episode num 940  Steps count? : 37, Cost: 912.0
For episode num 941  Steps count? : 36, Cost: 913.0
For episode num 942  Steps count? : 53, Cost: 914.0
For episode num 943  Steps count? : 51, Cost: 915.0
For episode num 944  Steps count? : 31, Cost: 916.0
For episode num 945  Steps count? : 37, Cost: 917.0
For episode num 946  Steps count? : 36, Cost: 918.0
For episode num 947  Steps count? : 34, Cost: 919.0
For episode num 948  Steps count? : 40, Cost: 920.0
For episode num 949  Steps count? : 36, Cost: 921.0
For episode num 950  Steps count? : 45, Cost: 922.0
For episode num 951  Steps count? : 38, Cost: 923.0
For episode num 952  Steps count? : 36, Cost: 924.0
For episode num 953  Steps count? : 33, Cost: 925.0
For episode num 954  Steps count? : 36, Cost: 926.0
For episode num 955  Steps count? : 26, Cost: 927.0
For episode num 956  Steps count? : 52, Cost: 928.0
For episode num 957  Steps count? : 50, Cost: 929.0
For episode num 958  Steps count? : 37, Cost: 930.0
For episode num 959  Steps count? : 45, Cost: 931.0
For episode num 960  Steps count? : 31, Cost: 932.0
For episode num 961  Steps count? : 38, Cost: 933.0
For episode num 962  Steps count? : 46, Cost: 934.0
For episode num 963  Steps count? : 52, Cost: 935.0
For episode num 964  Steps count? : 46, Cost: 936.0
For episode num 965  Steps count? : 63, Cost: 937.0
For episode num 966  Steps count? : 43, Cost: 938.0
Warning: trajectory cut off when rollout by epoch at 26.0 steps.
Processing rollout for epoch: 19... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.014093667268753052 Actual: 0.014229504391551018
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.843255043029785     │
│ Metrics/EpCost                │ 1.0                    │
│ Metrics/EpLen                 │ 39.47999954223633      │
│ Train/Epoch                   │ 19.0                   │
│ Train/Entropy                 │ 1.1595122814178467     │
│ Train/KL                      │ 0.00027853556093759835 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.003377914428711      │
│ Train/PolicyRatio/Min         │ 1.003377914428711      │
│ Train/PolicyRatio/Max         │ 1.003377914428711      │
│ Train/PolicyRatio/Std         │ 0.0023886023554950953  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.7715137004852295     │
│ TotalEnvSteps                 │ 40000.0                │
│ Loss/Loss_pi                  │ -0.010647683404386044  │
│ Loss/Loss_pi/Delta            │ -0.007119118003174663  │
│ Value/Adv                     │ 2.3841859042583735e-10 │
│ Loss/Loss_reward_critic       │ 0.06119035929441452    │
│ Loss/Loss_reward_critic/Delta │ -0.004688650369644165  │
│ Value/reward                  │ -3.2805583477020264    │
│ Loss/Loss_cost_critic         │ 0.005078847520053387   │
│ Loss/Loss_cost_critic/Delta   │ -0.0002937121316790581 │
│ Value/cost                    │ 0.8141046166419983     │
│ Time/Total                    │ 54.19184875488281      │
│ Time/Rollout                  │ 1.6132593154907227     │
│ Time/Update                   │ 0.8401765823364258     │
│ Time/Epoch                    │ 2.45345139503479       │
│ Time/FPS                      │ 815.178466796875       │
│ Misc/Alpha                    │ 1.4207676649093628     │
│ Misc/FinalStepNorm            │ 0.1775428205728531     │
│ Misc/gradient_norm            │ 0.20719222724437714    │
│ Misc/xHx                      │ 0.009907941333949566   │
│ Misc/H_inv_g                  │ 0.12496260553598404    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.008147657848894596   │
│ Misc/A                        │ 0.00943306926637888    │
│ Misc/B                        │ -32153074.0            │
│ Misc/q                        │ 0.009907941333949566   │
│ Misc/r                        │ 9.22334220376797e-05   │
│ Misc/s                        │ 1.790430542314425e-05  │
│ Misc/Lambda_star              │ 0.7038448452949524     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 967  Steps count? : 26, Cost: 938.0
For episode num 968  Steps count? : 43, Cost: 939.0
For episode num 969  Steps count? : 28, Cost: 940.0
For episode num 970  Steps count? : 40, Cost: 941.0
For episode num 971  Steps count? : 29, Cost: 942.0
For episode num 972  Steps count? : 33, Cost: 943.0
For episode num 973  Steps count? : 34, Cost: 944.0
For episode num 974  Steps count? : 36, Cost: 945.0
For episode num 975  Steps count? : 40, Cost: 946.0
For episode num 976  Steps count? : 39, Cost: 947.0
For episode num 977  Steps count? : 32, Cost: 948.0
For episode num 978  Steps count? : 32, Cost: 949.0
For episode num 979  Steps count? : 31, Cost: 950.0
For episode num 980  Steps count? : 39, Cost: 951.0
For episode num 981  Steps count? : 40, Cost: 952.0
For episode num 982  Steps count? : 37, Cost: 953.0
For episode num 983  Steps count? : 32, Cost: 954.0
For episode num 984  Steps count? : 38, Cost: 955.0
For episode num 985  Steps count? : 35, Cost: 956.0
For episode num 986  Steps count? : 38, Cost: 957.0
For episode num 987  Steps count? : 35, Cost: 958.0
For episode num 988  Steps count? : 36, Cost: 959.0
For episode num 989  Steps count? : 40, Cost: 960.0
For episode num 990  Steps count? : 39, Cost: 961.0
For episode num 991  Steps count? : 32, Cost: 962.0
For episode num 992  Steps count? : 37, Cost: 963.0
For episode num 993  Steps count? : 55, Cost: 964.0
For episode num 994  Steps count? : 49, Cost: 965.0
For episode num 995  Steps count? : 35, Cost: 966.0
For episode num 996  Steps count? : 60, Cost: 967.0
For episode num 997  Steps count? : 42, Cost: 968.0
For episode num 998  Steps count? : 36, Cost: 969.0
For episode num 999  Steps count? : 33, Cost: 970.0
For episode num 1000  Steps count? : 78, Cost: 971.0
For episode num 1001  Steps count? : 30, Cost: 972.0
For episode num 1002  Steps count? : 68, Cost: 973.0
For episode num 1003  Steps count? : 34, Cost: 974.0
For episode num 1004  Steps count? : 33, Cost: 975.0
For episode num 1005  Steps count? : 44, Cost: 976.0
For episode num 1006  Steps count? : 47, Cost: 977.0
For episode num 1007  Steps count? : 38, Cost: 978.0
For episode num 1008  Steps count? : 40, Cost: 979.0
For episode num 1009  Steps count? : 36, Cost: 980.0
For episode num 1010  Steps count? : 42, Cost: 981.0
For episode num 1011  Steps count? : 51, Cost: 982.0
For episode num 1012  Steps count? : 49, Cost: 983.0
For episode num 1013  Steps count? : 48, Cost: 984.0
For episode num 1014  Steps count? : 52, Cost: 985.0
For episode num 1015  Steps count? : 47, Cost: 986.0
For episode num 1016  Steps count? : 33, Cost: 987.0
Warning: trajectory cut off when rollout by epoch at 25.0 steps.
Processing rollout for epoch: 20... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.01420357171446085 Actual: 0.01423964835703373
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.837098598480225      │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 40.36000061035156       │
│ Train/Epoch                   │ 20.0                    │
│ Train/Entropy                 │ 1.1454293727874756      │
│ Train/KL                      │ 0.00021303436369635165  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0007880926132202      │
│ Train/PolicyRatio/Min         │ 1.0007880926132202      │
│ Train/PolicyRatio/Max         │ 1.0007880926132202      │
│ Train/PolicyRatio/Std         │ 0.0005571813089773059   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.7607211470603943      │
│ TotalEnvSteps                 │ 42000.0                 │
│ Loss/Loss_pi                  │ -0.010670471005141735   │
│ Loss/Loss_pi/Delta            │ -2.278760075569153e-05  │
│ Value/Adv                     │ -5.4836273299940785e-09 │
│ Loss/Loss_reward_critic       │ 0.0629071518778801      │
│ Loss/Loss_reward_critic/Delta │ 0.0017167925834655762   │
│ Value/reward                  │ -3.286423444747925      │
│ Loss/Loss_cost_critic         │ 0.005074230954051018    │
│ Loss/Loss_cost_critic/Delta   │ -4.616566002368927e-06  │
│ Value/cost                    │ 0.8130733370780945      │
│ Time/Total                    │ 56.680091857910156      │
│ Time/Rollout                  │ 1.6198036670684814      │
│ Time/Update                   │ 0.8505001068115234      │
│ Time/Epoch                    │ 2.470320701599121       │
│ Time/FPS                      │ 809.61181640625         │
│ Misc/Alpha                    │ 1.406072735786438       │
│ Misc/FinalStepNorm            │ 0.2243976891040802      │
│ Misc/gradient_norm            │ 0.23672032356262207     │
│ Misc/xHx                      │ 0.010116120800375938    │
│ Misc/H_inv_g                  │ 0.15959180891513824     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.011105574667453766    │
│ Misc/A                        │ 0.010001765564084053    │
│ Misc/B                        │ -15573714.0             │
│ Misc/q                        │ 0.010116120800375938    │
│ Misc/r                        │ 6.503448821604252e-05   │
│ Misc/s                        │ 3.6975397961214185e-05  │
│ Misc/Lambda_star              │ 0.7112007737159729      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 1017  Steps count? : 25, Cost: 987.0
For episode num 1018  Steps count? : 45, Cost: 988.0
For episode num 1019  Steps count? : 34, Cost: 989.0
For episode num 1020  Steps count? : 28, Cost: 990.0
For episode num 1021  Steps count? : 35, Cost: 991.0
For episode num 1022  Steps count? : 35, Cost: 992.0
For episode num 1023  Steps count? : 41, Cost: 993.0
For episode num 1024  Steps count? : 46, Cost: 994.0
For episode num 1025  Steps count? : 45, Cost: 995.0
For episode num 1026  Steps count? : 38, Cost: 996.0
For episode num 1027  Steps count? : 48, Cost: 997.0
For episode num 1028  Steps count? : 46, Cost: 998.0
For episode num 1029  Steps count? : 29, Cost: 999.0
For episode num 1030  Steps count? : 41, Cost: 1000.0
For episode num 1031  Steps count? : 39, Cost: 1001.0
For episode num 1032  Steps count? : 36, Cost: 1002.0
For episode num 1033  Steps count? : 30, Cost: 1003.0
For episode num 1034  Steps count? : 42, Cost: 1004.0
For episode num 1035  Steps count? : 41, Cost: 1005.0
For episode num 1036  Steps count? : 44, Cost: 1006.0
For episode num 1037  Steps count? : 46, Cost: 1007.0
For episode num 1038  Steps count? : 34, Cost: 1008.0
For episode num 1039  Steps count? : 50, Cost: 1009.0
For episode num 1040  Steps count? : 55, Cost: 1010.0
For episode num 1041  Steps count? : 32, Cost: 1011.0
For episode num 1042  Steps count? : 34, Cost: 1012.0
For episode num 1043  Steps count? : 50, Cost: 1013.0
For episode num 1044  Steps count? : 31, Cost: 1014.0
For episode num 1045  Steps count? : 39, Cost: 1015.0
For episode num 1046  Steps count? : 46, Cost: 1016.0
For episode num 1047  Steps count? : 43, Cost: 1017.0
For episode num 1048  Steps count? : 38, Cost: 1018.0
For episode num 1049  Steps count? : 33, Cost: 1019.0
For episode num 1050  Steps count? : 40, Cost: 1020.0
For episode num 1051  Steps count? : 34, Cost: 1021.0
For episode num 1052  Steps count? : 27, Cost: 1022.0
For episode num 1053  Steps count? : 41, Cost: 1023.0
For episode num 1054  Steps count? : 45, Cost: 1024.0
For episode num 1055  Steps count? : 34, Cost: 1025.0
For episode num 1056  Steps count? : 31, Cost: 1026.0
For episode num 1057  Steps count? : 34, Cost: 1027.0
For episode num 1058  Steps count? : 44, Cost: 1028.0
For episode num 1059  Steps count? : 49, Cost: 1029.0
For episode num 1060  Steps count? : 35, Cost: 1030.0
For episode num 1061  Steps count? : 44, Cost: 1031.0
For episode num 1062  Steps count? : 44, Cost: 1032.0
For episode num 1063  Steps count? : 35, Cost: 1033.0
For episode num 1064  Steps count? : 40, Cost: 1034.0
For episode num 1065  Steps count? : 35, Cost: 1035.0
For episode num 1066  Steps count? : 45, Cost: 1036.0
For episode num 1067  Steps count? : 38, Cost: 1037.0
Warning: trajectory cut off when rollout by epoch at 31.0 steps.
Processing rollout for epoch: 21... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.008280500769615173 Actual: 0.007722941227257252
INFO: violated KL constraint 0.010057487525045872 at step 1.
Expected Improvement: 0.008280500769615173 Actual: 0.006261165253818035
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.823111534118652     │
│ Metrics/EpCost                │ 1.0                    │
│ Metrics/EpLen                 │ 39.380001068115234     │
│ Train/Epoch                   │ 21.0                   │
│ Train/Entropy                 │ 1.1088614463806152     │
│ Train/KL                      │ 0.00018554275447968394 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 0.9991900324821472     │
│ Train/PolicyRatio/Min         │ 0.9991900324821472     │
│ Train/PolicyRatio/Max         │ 0.9991900324821472     │
│ Train/PolicyRatio/Std         │ 0.0004717518750112504  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.7335227727890015     │
│ TotalEnvSteps                 │ 44000.0                │
│ Loss/Loss_pi                  │ -0.005333651788532734  │
│ Loss/Loss_pi/Delta            │ 0.005336819216609001   │
│ Value/Adv                     │ 6.675720420901143e-09  │
│ Loss/Loss_reward_critic       │ 0.0580243282020092     │
│ Loss/Loss_reward_critic/Delta │ -0.004882823675870895  │
│ Value/reward                  │ -3.2570040225982666    │
│ Loss/Loss_cost_critic         │ 0.004532304592430592   │
│ Loss/Loss_cost_critic/Delta   │ -0.0005419263616204262 │
│ Value/cost                    │ 0.8108842372894287     │
│ Time/Total                    │ 59.386573791503906     │
│ Time/Rollout                  │ 1.6126515865325928     │
│ Time/Update                   │ 1.075517177581787      │
│ Time/Epoch                    │ 2.688190221786499      │
│ Time/FPS                      │ 743.9951782226562      │
│ Misc/Alpha                    │ 2.4224770069122314     │
│ Misc/FinalStepNorm            │ 0.10787759721279144    │
│ Misc/gradient_norm            │ 0.18432433903217316    │
│ Misc/xHx                      │ 0.0034080767072737217  │
│ Misc/H_inv_g                  │ 0.05566491186618805    │
│ Misc/AcceptanceStep           │ 2.0                    │
│ Misc/cost_gradient_norm       │ 0.013285744935274124   │
│ Misc/A                        │ 0.002962966449558735   │
│ Misc/B                        │ -32491590.0            │
│ Misc/q                        │ 0.0034080767072737217  │
│ Misc/r                        │ -8.882996917236596e-05 │
│ Misc/s                        │ 1.7717662558425218e-05 │
│ Misc/Lambda_star              │ 0.4128006100654602     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 1068  Steps count? : 31, Cost: 1037.0
For episode num 1069  Steps count? : 32, Cost: 1038.0
For episode num 1070  Steps count? : 35, Cost: 1039.0
For episode num 1071  Steps count? : 40, Cost: 1040.0
For episode num 1072  Steps count? : 39, Cost: 1041.0
For episode num 1073  Steps count? : 29, Cost: 1042.0
For episode num 1074  Steps count? : 29, Cost: 1043.0
For episode num 1075  Steps count? : 41, Cost: 1044.0
For episode num 1076  Steps count? : 45, Cost: 1045.0
For episode num 1077  Steps count? : 36, Cost: 1046.0
For episode num 1078  Steps count? : 34, Cost: 1047.0
For episode num 1079  Steps count? : 30, Cost: 1048.0
For episode num 1080  Steps count? : 30, Cost: 1049.0
For episode num 1081  Steps count? : 32, Cost: 1050.0
For episode num 1082  Steps count? : 38, Cost: 1051.0
For episode num 1083  Steps count? : 36, Cost: 1052.0
For episode num 1084  Steps count? : 37, Cost: 1053.0
For episode num 1085  Steps count? : 35, Cost: 1054.0
For episode num 1086  Steps count? : 34, Cost: 1055.0
For episode num 1087  Steps count? : 33, Cost: 1056.0
For episode num 1088  Steps count? : 35, Cost: 1057.0
For episode num 1089  Steps count? : 36, Cost: 1058.0
For episode num 1090  Steps count? : 33, Cost: 1059.0
For episode num 1091  Steps count? : 32, Cost: 1060.0
For episode num 1092  Steps count? : 39, Cost: 1061.0
For episode num 1093  Steps count? : 50, Cost: 1062.0
For episode num 1094  Steps count? : 38, Cost: 1063.0
For episode num 1095  Steps count? : 34, Cost: 1064.0
For episode num 1096  Steps count? : 29, Cost: 1065.0
For episode num 1097  Steps count? : 31, Cost: 1066.0
For episode num 1098  Steps count? : 35, Cost: 1067.0
For episode num 1099  Steps count? : 44, Cost: 1068.0
For episode num 1100  Steps count? : 38, Cost: 1069.0
For episode num 1101  Steps count? : 40, Cost: 1070.0
For episode num 1102  Steps count? : 51, Cost: 1071.0
For episode num 1103  Steps count? : 33, Cost: 1072.0
For episode num 1104  Steps count? : 33, Cost: 1073.0
For episode num 1105  Steps count? : 45, Cost: 1074.0
For episode num 1106  Steps count? : 35, Cost: 1075.0
For episode num 1107  Steps count? : 46, Cost: 1076.0
For episode num 1108  Steps count? : 48, Cost: 1077.0
For episode num 1109  Steps count? : 34, Cost: 1078.0
For episode num 1110  Steps count? : 36, Cost: 1079.0
For episode num 1111  Steps count? : 38, Cost: 1080.0
For episode num 1112  Steps count? : 36, Cost: 1081.0
For episode num 1113  Steps count? : 39, Cost: 1082.0
For episode num 1114  Steps count? : 46, Cost: 1083.0
For episode num 1115  Steps count? : 34, Cost: 1084.0
For episode num 1116  Steps count? : 50, Cost: 1085.0
For episode num 1117  Steps count? : 37, Cost: 1086.0
For episode num 1118  Steps count? : 50, Cost: 1087.0
For episode num 1119  Steps count? : 37, Cost: 1088.0
For episode num 1120  Steps count? : 58, Cost: 1089.0
Warning: trajectory cut off when rollout by epoch at 35.0 steps.
Processing rollout for epoch: 22... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.011492696590721607 Actual: 0.011902009136974812
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.838537693023682      │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 37.959999084472656      │
│ Train/Epoch                   │ 22.0                    │
│ Train/Entropy                 │ 1.0974626541137695      │
│ Train/KL                      │ 0.00021927442867308855  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9997962117195129      │
│ Train/PolicyRatio/Min         │ 0.9997962117195129      │
│ Train/PolicyRatio/Max         │ 0.9997962117195129      │
│ Train/PolicyRatio/Std         │ 0.00014408603601623327  │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.7250809073448181      │
│ TotalEnvSteps                 │ 46000.0                 │
│ Loss/Loss_pi                  │ -0.00891457311809063    │
│ Loss/Loss_pi/Delta            │ -0.0035809213295578957  │
│ Value/Adv                     │ 3.5762786065873797e-09  │
│ Loss/Loss_reward_critic       │ 0.05755036324262619     │
│ Loss/Loss_reward_critic/Delta │ -0.00047396495938301086 │
│ Value/reward                  │ -3.2859199047088623     │
│ Loss/Loss_cost_critic         │ 0.004295151215046644    │
│ Loss/Loss_cost_critic/Delta   │ -0.00023715337738394737 │
│ Value/cost                    │ 0.8187703490257263      │
│ Time/Total                    │ 62.95138931274414       │
│ Time/Rollout                  │ 2.6750166416168213      │
│ Time/Update                   │ 0.866997480392456       │
│ Time/Epoch                    │ 3.5420424938201904      │
│ Time/FPS                      │ 564.6461181640625       │
│ Misc/Alpha                    │ 1.7426360845565796      │
│ Misc/FinalStepNorm            │ 0.23836205899715424     │
│ Misc/gradient_norm            │ 0.21170006692409515     │
│ Misc/xHx                      │ 0.006585911847651005    │
│ Misc/H_inv_g                  │ 0.1367824673652649      │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.009462734684348106    │
│ Misc/A                        │ 0.006510583218187094    │
│ Misc/B                        │ -62507520.0             │
│ Misc/q                        │ 0.006585911847651005    │
│ Misc/r                        │ 2.634664997458458e-05   │
│ Misc/s                        │ 9.204891284753103e-06   │
│ Misc/Lambda_star              │ 0.5738433003425598      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 1121  Steps count? : 35, Cost: 1089.0
For episode num 1122  Steps count? : 38, Cost: 1090.0
For episode num 1123  Steps count? : 46, Cost: 1091.0
For episode num 1124  Steps count? : 36, Cost: 1092.0
For episode num 1125  Steps count? : 58, Cost: 1093.0
For episode num 1126  Steps count? : 38, Cost: 1094.0
For episode num 1127  Steps count? : 39, Cost: 1095.0
For episode num 1128  Steps count? : 39, Cost: 1096.0
For episode num 1129  Steps count? : 42, Cost: 1097.0
For episode num 1130  Steps count? : 43, Cost: 1098.0
For episode num 1131  Steps count? : 55, Cost: 1099.0
For episode num 1132  Steps count? : 49, Cost: 1100.0
For episode num 1133  Steps count? : 30, Cost: 1101.0
For episode num 1134  Steps count? : 50, Cost: 1102.0
For episode num 1135  Steps count? : 34, Cost: 1103.0
For episode num 1136  Steps count? : 41, Cost: 1104.0
For episode num 1137  Steps count? : 40, Cost: 1105.0
For episode num 1138  Steps count? : 38, Cost: 1106.0
For episode num 1139  Steps count? : 28, Cost: 1107.0
For episode num 1140  Steps count? : 29, Cost: 1108.0
For episode num 1141  Steps count? : 30, Cost: 1109.0
For episode num 1142  Steps count? : 37, Cost: 1110.0
For episode num 1143  Steps count? : 37, Cost: 1111.0
For episode num 1144  Steps count? : 45, Cost: 1112.0
For episode num 1145  Steps count? : 38, Cost: 1113.0
For episode num 1146  Steps count? : 48, Cost: 1114.0
For episode num 1147  Steps count? : 38, Cost: 1115.0
For episode num 1148  Steps count? : 30, Cost: 1116.0
For episode num 1149  Steps count? : 37, Cost: 1117.0
For episode num 1150  Steps count? : 45, Cost: 1118.0
For episode num 1151  Steps count? : 33, Cost: 1119.0
For episode num 1152  Steps count? : 46, Cost: 1120.0
For episode num 1153  Steps count? : 44, Cost: 1121.0
For episode num 1154  Steps count? : 39, Cost: 1122.0
For episode num 1155  Steps count? : 38, Cost: 1123.0
For episode num 1156  Steps count? : 37, Cost: 1124.0
For episode num 1157  Steps count? : 34, Cost: 1125.0
For episode num 1158  Steps count? : 41, Cost: 1126.0
For episode num 1159  Steps count? : 48, Cost: 1127.0
For episode num 1160  Steps count? : 49, Cost: 1128.0
For episode num 1161  Steps count? : 53, Cost: 1129.0
For episode num 1162  Steps count? : 34, Cost: 1130.0
For episode num 1163  Steps count? : 37, Cost: 1131.0
For episode num 1164  Steps count? : 41, Cost: 1132.0
For episode num 1165  Steps count? : 37, Cost: 1133.0
For episode num 1166  Steps count? : 37, Cost: 1134.0
For episode num 1167  Steps count? : 41, Cost: 1135.0
For episode num 1168  Steps count? : 33, Cost: 1136.0
For episode num 1169  Steps count? : 30, Cost: 1137.0
For episode num 1170  Steps count? : 47, Cost: 1138.0
For episode num 1171  Steps count? : 32, Cost: 1139.0
Warning: trajectory cut off when rollout by epoch at 11.0 steps.
Processing rollout for epoch: 23... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.011129837483167648 Actual: 0.009125970304012299
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.837619781494141      │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 39.779998779296875      │
│ Train/Epoch                   │ 23.0                    │
│ Train/Entropy                 │ 1.0669111013412476      │
│ Train/KL                      │ 0.00023487787984777242  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0021166801452637      │
│ Train/PolicyRatio/Min         │ 1.0021166801452637      │
│ Train/PolicyRatio/Max         │ 1.0021166801452637      │
│ Train/PolicyRatio/Std         │ 0.0014967188471928239   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.7034047245979309      │
│ TotalEnvSteps                 │ 48000.0                 │
│ Loss/Loss_pi                  │ -0.006801403593271971   │
│ Loss/Loss_pi/Delta            │ 0.002113169524818659    │
│ Value/Adv                     │ -1.1086464191123468e-08 │
│ Loss/Loss_reward_critic       │ 0.05563729256391525     │
│ Loss/Loss_reward_critic/Delta │ -0.0019130706787109375  │
│ Value/reward                  │ -3.326199531555176      │
│ Loss/Loss_cost_critic         │ 0.00423132861033082     │
│ Loss/Loss_cost_critic/Delta   │ -6.382260471582413e-05  │
│ Value/cost                    │ 0.8172957897186279      │
│ Time/Total                    │ 65.43490600585938       │
│ Time/Rollout                  │ 1.6044423580169678      │
│ Time/Update                   │ 0.8594141006469727      │
│ Time/Epoch                    │ 2.4638731479644775      │
│ Time/FPS                      │ 811.7304077148438       │
│ Misc/Alpha                    │ 1.801056146621704       │
│ Misc/FinalStepNorm            │ 0.21993252635002136     │
│ Misc/gradient_norm            │ 0.19425442814826965     │
│ Misc/xHx                      │ 0.00616559199988842     │
│ Misc/H_inv_g                  │ 0.12211307883262634     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.010453150607645512    │
│ Misc/A                        │ 0.004999450407922268    │
│ Misc/B                        │ -40002152.0             │
│ Misc/q                        │ 0.00616559199988842     │
│ Misc/r                        │ 0.00012958215666003525  │
│ Misc/s                        │ 1.438922481611371e-05   │
│ Misc/Lambda_star              │ 0.5552297830581665      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 1172  Steps count? : 11, Cost: 1139.0
For episode num 1173  Steps count? : 51, Cost: 1140.0
For episode num 1174  Steps count? : 30, Cost: 1141.0
For episode num 1175  Steps count? : 44, Cost: 1142.0
For episode num 1176  Steps count? : 39, Cost: 1143.0
For episode num 1177  Steps count? : 38, Cost: 1144.0
For episode num 1178  Steps count? : 38, Cost: 1145.0
For episode num 1179  Steps count? : 43, Cost: 1146.0
For episode num 1180  Steps count? : 37, Cost: 1147.0
For episode num 1181  Steps count? : 31, Cost: 1148.0
For episode num 1182  Steps count? : 50, Cost: 1149.0
For episode num 1183  Steps count? : 67, Cost: 1150.0
For episode num 1184  Steps count? : 29, Cost: 1151.0
For episode num 1185  Steps count? : 43, Cost: 1152.0
For episode num 1186  Steps count? : 45, Cost: 1153.0
For episode num 1187  Steps count? : 39, Cost: 1154.0
For episode num 1188  Steps count? : 34, Cost: 1155.0
For episode num 1189  Steps count? : 29, Cost: 1156.0
For episode num 1190  Steps count? : 29, Cost: 1157.0
For episode num 1191  Steps count? : 34, Cost: 1158.0
For episode num 1192  Steps count? : 43, Cost: 1159.0
For episode num 1193  Steps count? : 32, Cost: 1160.0
For episode num 1194  Steps count? : 48, Cost: 1161.0
For episode num 1195  Steps count? : 34, Cost: 1162.0
For episode num 1196  Steps count? : 33, Cost: 1163.0
For episode num 1197  Steps count? : 31, Cost: 1164.0
For episode num 1198  Steps count? : 42, Cost: 1165.0
For episode num 1199  Steps count? : 54, Cost: 1166.0
For episode num 1200  Steps count? : 35, Cost: 1167.0
For episode num 1201  Steps count? : 35, Cost: 1168.0
For episode num 1202  Steps count? : 36, Cost: 1169.0
For episode num 1203  Steps count? : 36, Cost: 1170.0
For episode num 1204  Steps count? : 41, Cost: 1171.0
For episode num 1205  Steps count? : 41, Cost: 1172.0
For episode num 1206  Steps count? : 34, Cost: 1173.0
For episode num 1207  Steps count? : 43, Cost: 1174.0
For episode num 1208  Steps count? : 51, Cost: 1175.0
For episode num 1209  Steps count? : 47, Cost: 1176.0
For episode num 1210  Steps count? : 35, Cost: 1177.0
For episode num 1211  Steps count? : 37, Cost: 1178.0
For episode num 1212  Steps count? : 41, Cost: 1179.0
For episode num 1213  Steps count? : 42, Cost: 1180.0
For episode num 1214  Steps count? : 44, Cost: 1181.0
For episode num 1215  Steps count? : 45, Cost: 1182.0
For episode num 1216  Steps count? : 33, Cost: 1183.0
For episode num 1217  Steps count? : 33, Cost: 1184.0
For episode num 1218  Steps count? : 34, Cost: 1185.0
For episode num 1219  Steps count? : 39, Cost: 1186.0
For episode num 1220  Steps count? : 43, Cost: 1187.0
For episode num 1221  Steps count? : 51, Cost: 1188.0
For episode num 1222  Steps count? : 32, Cost: 1189.0
Warning: trajectory cut off when rollout by epoch at 25.0 steps.
Processing rollout for epoch: 24... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.011435158550739288 Actual: 0.010965116322040558
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.817120552062988      │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 39.5                    │
│ Train/Epoch                   │ 24.0                    │
│ Train/Entropy                 │ 1.011804223060608       │
│ Train/KL                      │ 0.00025862338952720165  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9964819550514221      │
│ Train/PolicyRatio/Min         │ 0.9964819550514221      │
│ Train/PolicyRatio/Max         │ 0.9964819550514221      │
│ Train/PolicyRatio/Std         │ 0.0024876194074749947   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.6658340692520142      │
│ TotalEnvSteps                 │ 50000.0                 │
│ Loss/Loss_pi                  │ -0.008234543725848198   │
│ Loss/Loss_pi/Delta            │ -0.0014331401325762272  │
│ Value/Adv                     │ -1.3232231310666975e-08 │
│ Loss/Loss_reward_critic       │ 0.05575031787157059     │
│ Loss/Loss_reward_critic/Delta │ 0.00011302530765533447  │
│ Value/reward                  │ -3.3125410079956055     │
│ Loss/Loss_cost_critic         │ 0.004074591677635908    │
│ Loss/Loss_cost_critic/Delta   │ -0.00015673693269491196 │
│ Value/cost                    │ 0.8203766942024231      │
│ Time/Total                    │ 67.92848205566406       │
│ Time/Rollout                  │ 1.6291821002960205      │
│ Time/Update                   │ 0.848473072052002       │
│ Time/Epoch                    │ 2.4776716232299805      │
│ Time/FPS                      │ 807.2097778320312       │
│ Misc/Alpha                    │ 1.7523270845413208      │
│ Misc/FinalStepNorm            │ 0.19372601807117462     │
│ Misc/gradient_norm            │ 0.18185098469257355     │
│ Misc/xHx                      │ 0.0065132686868309975   │
│ Misc/H_inv_g                  │ 0.1105535626411438      │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.017664767801761627    │
│ Misc/A                        │ 0.006388879381120205    │
│ Misc/B                        │ -51032852.0             │
│ Misc/q                        │ 0.0065132686868309975   │
│ Misc/r                        │ -3.746949369087815e-05  │
│ Misc/s                        │ 1.127684663515538e-05   │
│ Misc/Lambda_star              │ 0.5706697106361389      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 1223  Steps count? : 25, Cost: 1189.0
For episode num 1224  Steps count? : 37, Cost: 1190.0
For episode num 1225  Steps count? : 34, Cost: 1191.0
For episode num 1226  Steps count? : 45, Cost: 1192.0
For episode num 1227  Steps count? : 36, Cost: 1193.0
For episode num 1228  Steps count? : 37, Cost: 1194.0
For episode num 1229  Steps count? : 30, Cost: 1195.0
For episode num 1230  Steps count? : 34, Cost: 1196.0
For episode num 1231  Steps count? : 36, Cost: 1197.0
For episode num 1232  Steps count? : 60, Cost: 1198.0
For episode num 1233  Steps count? : 38, Cost: 1199.0
For episode num 1234  Steps count? : 46, Cost: 1200.0
For episode num 1235  Steps count? : 34, Cost: 1201.0
For episode num 1236  Steps count? : 38, Cost: 1202.0
For episode num 1237  Steps count? : 33, Cost: 1203.0
For episode num 1238  Steps count? : 31, Cost: 1204.0
For episode num 1239  Steps count? : 36, Cost: 1205.0
For episode num 1240  Steps count? : 37, Cost: 1206.0
For episode num 1241  Steps count? : 36, Cost: 1207.0
For episode num 1242  Steps count? : 32, Cost: 1208.0
For episode num 1243  Steps count? : 42, Cost: 1209.0
For episode num 1244  Steps count? : 29, Cost: 1210.0
For episode num 1245  Steps count? : 36, Cost: 1211.0
For episode num 1246  Steps count? : 49, Cost: 1212.0
For episode num 1247  Steps count? : 35, Cost: 1213.0
For episode num 1248  Steps count? : 60, Cost: 1214.0
For episode num 1249  Steps count? : 33, Cost: 1215.0
For episode num 1250  Steps count? : 34, Cost: 1216.0
For episode num 1251  Steps count? : 35, Cost: 1217.0
For episode num 1252  Steps count? : 32, Cost: 1218.0
For episode num 1253  Steps count? : 30, Cost: 1219.0
For episode num 1254  Steps count? : 36, Cost: 1220.0
For episode num 1255  Steps count? : 40, Cost: 1221.0
For episode num 1256  Steps count? : 50, Cost: 1222.0
For episode num 1257  Steps count? : 32, Cost: 1223.0
For episode num 1258  Steps count? : 34, Cost: 1224.0
For episode num 1259  Steps count? : 41, Cost: 1225.0
For episode num 1260  Steps count? : 39, Cost: 1226.0
For episode num 1261  Steps count? : 35, Cost: 1227.0
For episode num 1262  Steps count? : 32, Cost: 1228.0
For episode num 1263  Steps count? : 38, Cost: 1229.0
For episode num 1264  Steps count? : 30, Cost: 1230.0
For episode num 1265  Steps count? : 41, Cost: 1231.0
For episode num 1266  Steps count? : 31, Cost: 1232.0
For episode num 1267  Steps count? : 40, Cost: 1233.0
For episode num 1268  Steps count? : 34, Cost: 1234.0
For episode num 1269  Steps count? : 39, Cost: 1235.0
For episode num 1270  Steps count? : 30, Cost: 1236.0
For episode num 1271  Steps count? : 42, Cost: 1237.0
For episode num 1272  Steps count? : 37, Cost: 1238.0
For episode num 1273  Steps count? : 38, Cost: 1239.0
For episode num 1274  Steps count? : 35, Cost: 1240.0
For episode num 1275  Steps count? : 44, Cost: 1241.0
For episode num 1276  Steps count? : 40, Cost: 1242.0
Warning: trajectory cut off when rollout by epoch at 17.0 steps.
Processing rollout for epoch: 25... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.013001075014472008 Actual: 0.013385039754211903
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.807092666625977      │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 37.34000015258789       │
│ Train/Epoch                   │ 25.0                    │
│ Train/Entropy                 │ 0.9571945667266846      │
│ Train/KL                      │ 0.0002833237231243402   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9985119700431824      │
│ Train/PolicyRatio/Min         │ 0.9985119700431824      │
│ Train/PolicyRatio/Max         │ 0.9985119700431824      │
│ Train/PolicyRatio/Std         │ 0.0010522101074457169   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.6303690075874329      │
│ TotalEnvSteps                 │ 52000.0                 │
│ Loss/Loss_pi                  │ -0.010087259113788605   │
│ Loss/Loss_pi/Delta            │ -0.0018527153879404068  │
│ Value/Adv                     │ -8.583068478174027e-09  │
│ Loss/Loss_reward_critic       │ 0.05350252613425255     │
│ Loss/Loss_reward_critic/Delta │ -0.002247791737318039   │
│ Value/reward                  │ -3.2806758880615234     │
│ Loss/Loss_cost_critic         │ 0.0039041973650455475   │
│ Loss/Loss_cost_critic/Delta   │ -0.00017039431259036064 │
│ Value/cost                    │ 0.8264594078063965      │
│ Time/Total                    │ 70.40589904785156       │
│ Time/Rollout                  │ 1.6032688617706299      │
│ Time/Update                   │ 0.8582248687744141      │
│ Time/Epoch                    │ 2.4615111351013184      │
│ Time/FPS                      │ 812.5093383789062       │
│ Misc/Alpha                    │ 1.540794014930725       │
│ Misc/FinalStepNorm            │ 0.175768181681633       │
│ Misc/gradient_norm            │ 0.2131377011537552      │
│ Misc/xHx                      │ 0.008424424566328526    │
│ Misc/H_inv_g                  │ 0.11407637596130371     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.013936706818640232    │
│ Misc/A                        │ 0.008006625808775425    │
│ Misc/B                        │ -34075328.0             │
│ Misc/q                        │ 0.008424424566328526    │
│ Misc/r                        │ -8.403779793297872e-05  │
│ Misc/s                        │ 1.689372402324807e-05   │
│ Misc/Lambda_star              │ 0.6490160226821899      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 1277  Steps count? : 17, Cost: 1242.0
For episode num 1278  Steps count? : 70, Cost: 1243.0
For episode num 1279  Steps count? : 55, Cost: 1244.0
For episode num 1280  Steps count? : 39, Cost: 1245.0
For episode num 1281  Steps count? : 50, Cost: 1246.0
For episode num 1282  Steps count? : 45, Cost: 1247.0
For episode num 1283  Steps count? : 62, Cost: 1248.0
For episode num 1284  Steps count? : 37, Cost: 1249.0
For episode num 1285  Steps count? : 34, Cost: 1250.0
For episode num 1286  Steps count? : 42, Cost: 1251.0
For episode num 1287  Steps count? : 42, Cost: 1252.0
For episode num 1288  Steps count? : 35, Cost: 1253.0
For episode num 1289  Steps count? : 28, Cost: 1254.0
For episode num 1290  Steps count? : 45, Cost: 1255.0
For episode num 1291  Steps count? : 29, Cost: 1256.0
For episode num 1292  Steps count? : 35, Cost: 1257.0
For episode num 1293  Steps count? : 44, Cost: 1258.0
For episode num 1294  Steps count? : 37, Cost: 1259.0
For episode num 1295  Steps count? : 37, Cost: 1260.0
For episode num 1296  Steps count? : 100, Cost: 1260.0
For episode num 1297  Steps count? : 35, Cost: 1261.0
For episode num 1298  Steps count? : 37, Cost: 1262.0
For episode num 1299  Steps count? : 50, Cost: 1263.0
For episode num 1300  Steps count? : 36, Cost: 1264.0
For episode num 1301  Steps count? : 45, Cost: 1265.0
For episode num 1302  Steps count? : 34, Cost: 1266.0
For episode num 1303  Steps count? : 33, Cost: 1267.0
For episode num 1304  Steps count? : 38, Cost: 1268.0
For episode num 1305  Steps count? : 35, Cost: 1269.0
For episode num 1306  Steps count? : 70, Cost: 1270.0
For episode num 1307  Steps count? : 39, Cost: 1271.0
For episode num 1308  Steps count? : 40, Cost: 1272.0
For episode num 1309  Steps count? : 34, Cost: 1273.0
For episode num 1310  Steps count? : 30, Cost: 1274.0
For episode num 1311  Steps count? : 32, Cost: 1275.0
For episode num 1312  Steps count? : 29, Cost: 1276.0
For episode num 1313  Steps count? : 40, Cost: 1277.0
For episode num 1314  Steps count? : 37, Cost: 1278.0
For episode num 1315  Steps count? : 60, Cost: 1279.0
For episode num 1316  Steps count? : 32, Cost: 1280.0
For episode num 1317  Steps count? : 34, Cost: 1281.0
For episode num 1318  Steps count? : 45, Cost: 1282.0
For episode num 1319  Steps count? : 52, Cost: 1283.0
For episode num 1320  Steps count? : 31, Cost: 1284.0
For episode num 1321  Steps count? : 40, Cost: 1285.0
For episode num 1322  Steps count? : 45, Cost: 1286.0
For episode num 1323  Steps count? : 37, Cost: 1287.0
For episode num 1324  Steps count? : 37, Cost: 1288.0
Warning: trajectory cut off when rollout by epoch at 27.0 steps.
Processing rollout for epoch: 26... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.013406893238425255 Actual: 0.01448467280715704
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.858656406402588      │
│ Metrics/EpCost                │ 0.9800000190734863      │
│ Metrics/EpLen                 │ 41.84000015258789       │
│ Train/Epoch                   │ 26.0                    │
│ Train/Entropy                 │ 0.9569447636604309      │
│ Train/KL                      │ 0.00021981373720336705  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9989820122718811      │
│ Train/PolicyRatio/Min         │ 0.9989820122718811      │
│ Train/PolicyRatio/Max         │ 0.9989820122718811      │
│ Train/PolicyRatio/Std         │ 0.0007198120001703501   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.6300709843635559      │
│ TotalEnvSteps                 │ 54000.0                 │
│ Loss/Loss_pi                  │ -0.010958751663565636   │
│ Loss/Loss_pi/Delta            │ -0.0008714925497770309  │
│ Value/Adv                     │ 1.692771967043427e-08   │
│ Loss/Loss_reward_critic       │ 0.06037980690598488     │
│ Loss/Loss_reward_critic/Delta │ 0.00687728077173233     │
│ Value/reward                  │ -3.384451150894165      │
│ Loss/Loss_cost_critic         │ 0.004328948445618153    │
│ Loss/Loss_cost_critic/Delta   │ 0.00042475108057260513  │
│ Value/cost                    │ 0.8259625434875488      │
│ Time/Total                    │ 72.8631591796875        │
│ Time/Rollout                  │ 1.6018359661102295      │
│ Time/Update                   │ 0.8394768238067627      │
│ Time/Epoch                    │ 2.441328525543213       │
│ Time/FPS                      │ 819.2263793945312       │
│ Misc/Alpha                    │ 1.4924468994140625      │
│ Misc/FinalStepNorm            │ 0.22307294607162476     │
│ Misc/gradient_norm            │ 0.3534148931503296      │
│ Misc/xHx                      │ 0.008979078382253647    │
│ Misc/H_inv_g                  │ 0.14946794509887695     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.01777653768658638     │
│ Misc/A                        │ 0.007289422210305929    │
│ Misc/B                        │ -34585184.0             │
│ Misc/q                        │ 0.008979078382253647    │
│ Misc/r                        │ -0.00016789088840596378 │
│ Misc/s                        │ 1.6672298443154432e-05  │
│ Misc/Lambda_star              │ 0.6700406074523926      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 1325  Steps count? : 27, Cost: 1288.0
For episode num 1326  Steps count? : 58, Cost: 1289.0
For episode num 1327  Steps count? : 40, Cost: 1290.0
For episode num 1328  Steps count? : 42, Cost: 1291.0
For episode num 1329  Steps count? : 40, Cost: 1292.0
For episode num 1330  Steps count? : 45, Cost: 1293.0
For episode num 1331  Steps count? : 30, Cost: 1294.0
For episode num 1332  Steps count? : 40, Cost: 1295.0
For episode num 1333  Steps count? : 39, Cost: 1296.0
For episode num 1334  Steps count? : 58, Cost: 1297.0
For episode num 1335  Steps count? : 55, Cost: 1298.0
For episode num 1336  Steps count? : 39, Cost: 1299.0
For episode num 1337  Steps count? : 32, Cost: 1300.0
For episode num 1338  Steps count? : 35, Cost: 1301.0
For episode num 1339  Steps count? : 45, Cost: 1302.0
For episode num 1340  Steps count? : 38, Cost: 1303.0
For episode num 1341  Steps count? : 45, Cost: 1304.0
For episode num 1342  Steps count? : 42, Cost: 1305.0
For episode num 1343  Steps count? : 37, Cost: 1306.0
For episode num 1344  Steps count? : 37, Cost: 1307.0
For episode num 1345  Steps count? : 36, Cost: 1308.0
For episode num 1346  Steps count? : 40, Cost: 1309.0
For episode num 1347  Steps count? : 57, Cost: 1310.0
For episode num 1348  Steps count? : 42, Cost: 1311.0
For episode num 1349  Steps count? : 45, Cost: 1312.0
For episode num 1350  Steps count? : 37, Cost: 1313.0
For episode num 1351  Steps count? : 67, Cost: 1314.0
For episode num 1352  Steps count? : 32, Cost: 1315.0
For episode num 1353  Steps count? : 50, Cost: 1316.0
For episode num 1354  Steps count? : 37, Cost: 1317.0
For episode num 1355  Steps count? : 59, Cost: 1318.0
For episode num 1356  Steps count? : 36, Cost: 1319.0
For episode num 1357  Steps count? : 43, Cost: 1320.0
For episode num 1358  Steps count? : 37, Cost: 1321.0
For episode num 1359  Steps count? : 50, Cost: 1322.0
For episode num 1360  Steps count? : 39, Cost: 1323.0
For episode num 1361  Steps count? : 33, Cost: 1324.0
For episode num 1362  Steps count? : 42, Cost: 1325.0
For episode num 1363  Steps count? : 45, Cost: 1326.0
For episode num 1364  Steps count? : 34, Cost: 1327.0
For episode num 1365  Steps count? : 38, Cost: 1328.0
For episode num 1366  Steps count? : 42, Cost: 1329.0
For episode num 1367  Steps count? : 34, Cost: 1330.0
For episode num 1368  Steps count? : 43, Cost: 1331.0
For episode num 1369  Steps count? : 49, Cost: 1332.0
For episode num 1370  Steps count? : 35, Cost: 1333.0
For episode num 1371  Steps count? : 41, Cost: 1334.0
For episode num 1372  Steps count? : 32, Cost: 1335.0
Warning: trajectory cut off when rollout by epoch at 28.0 steps.
Processing rollout for epoch: 27... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.010013232007622719 Actual: 0.009358160197734833
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.893587112426758      │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 41.81999969482422       │
│ Train/Epoch                   │ 27.0                    │
│ Train/Entropy                 │ 0.9369792342185974      │
│ Train/KL                      │ 0.00021534404368139803  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0020028352737427      │
│ Train/PolicyRatio/Min         │ 1.0020028352737427      │
│ Train/PolicyRatio/Max         │ 1.0020028352737427      │
│ Train/PolicyRatio/Std         │ 0.0014162465231493115   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.6176972389221191      │
│ TotalEnvSteps                 │ 56000.0                 │
│ Loss/Loss_pi                  │ -0.006995189469307661   │
│ Loss/Loss_pi/Delta            │ 0.003963562194257975    │
│ Value/Adv                     │ -1.0490417423625331e-08 │
│ Loss/Loss_reward_critic       │ 0.056908491998910904    │
│ Loss/Loss_reward_critic/Delta │ -0.0034713149070739746  │
│ Value/reward                  │ -3.322749614715576      │
│ Loss/Loss_cost_critic         │ 0.004341385327279568    │
│ Loss/Loss_cost_critic/Delta   │ 1.24368816614151e-05    │
│ Value/cost                    │ 0.8086732029914856      │
│ Time/Total                    │ 75.33457946777344       │
│ Time/Rollout                  │ 1.6088500022888184      │
│ Time/Update                   │ 0.8466281890869141      │
│ Time/Epoch                    │ 2.4554944038391113      │
│ Time/FPS                      │ 814.500244140625        │
│ Misc/Alpha                    │ 2.005610466003418       │
│ Misc/FinalStepNorm            │ 0.2259305864572525      │
│ Misc/gradient_norm            │ 0.15939858555793762     │
│ Misc/xHx                      │ 0.004972054623067379    │
│ Misc/H_inv_g                  │ 0.11264929175376892     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.008805238641798496    │
│ Misc/A                        │ 0.004676878452301025    │
│ Misc/B                        │ -42534952.0             │
│ Misc/q                        │ 0.004972054623067379    │
│ Misc/r                        │ 6.322355329757556e-05   │
│ Misc/s                        │ 1.3531804142985493e-05  │
│ Misc/Lambda_star              │ 0.4986013174057007      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 1373  Steps count? : 28, Cost: 1335.0
For episode num 1374  Steps count? : 44, Cost: 1336.0
For episode num 1375  Steps count? : 31, Cost: 1337.0
For episode num 1376  Steps count? : 33, Cost: 1338.0
For episode num 1377  Steps count? : 44, Cost: 1339.0
For episode num 1378  Steps count? : 41, Cost: 1340.0
For episode num 1379  Steps count? : 39, Cost: 1341.0
For episode num 1380  Steps count? : 34, Cost: 1342.0
For episode num 1381  Steps count? : 43, Cost: 1343.0
For episode num 1382  Steps count? : 46, Cost: 1344.0
For episode num 1383  Steps count? : 49, Cost: 1345.0
For episode num 1384  Steps count? : 38, Cost: 1346.0
For episode num 1385  Steps count? : 40, Cost: 1347.0
For episode num 1386  Steps count? : 45, Cost: 1348.0
For episode num 1387  Steps count? : 35, Cost: 1349.0
For episode num 1388  Steps count? : 35, Cost: 1350.0
For episode num 1389  Steps count? : 35, Cost: 1351.0
For episode num 1390  Steps count? : 46, Cost: 1352.0
For episode num 1391  Steps count? : 42, Cost: 1353.0
For episode num 1392  Steps count? : 33, Cost: 1354.0
For episode num 1393  Steps count? : 33, Cost: 1355.0
For episode num 1394  Steps count? : 44, Cost: 1356.0
For episode num 1395  Steps count? : 48, Cost: 1357.0
For episode num 1396  Steps count? : 46, Cost: 1358.0
For episode num 1397  Steps count? : 34, Cost: 1359.0
For episode num 1398  Steps count? : 55, Cost: 1360.0
For episode num 1399  Steps count? : 36, Cost: 1361.0
For episode num 1400  Steps count? : 34, Cost: 1362.0
For episode num 1401  Steps count? : 43, Cost: 1363.0
For episode num 1402  Steps count? : 48, Cost: 1364.0
For episode num 1403  Steps count? : 36, Cost: 1365.0
For episode num 1404  Steps count? : 45, Cost: 1366.0
For episode num 1405  Steps count? : 39, Cost: 1367.0
For episode num 1406  Steps count? : 33, Cost: 1368.0
For episode num 1407  Steps count? : 37, Cost: 1369.0
For episode num 1408  Steps count? : 34, Cost: 1370.0
For episode num 1409  Steps count? : 41, Cost: 1371.0
For episode num 1410  Steps count? : 50, Cost: 1372.0
For episode num 1411  Steps count? : 32, Cost: 1373.0
For episode num 1412  Steps count? : 35, Cost: 1374.0
For episode num 1413  Steps count? : 40, Cost: 1375.0
For episode num 1414  Steps count? : 36, Cost: 1376.0
For episode num 1415  Steps count? : 36, Cost: 1377.0
For episode num 1416  Steps count? : 33, Cost: 1378.0
For episode num 1417  Steps count? : 47, Cost: 1379.0
For episode num 1418  Steps count? : 39, Cost: 1380.0
For episode num 1419  Steps count? : 47, Cost: 1381.0
For episode num 1420  Steps count? : 41, Cost: 1382.0
For episode num 1421  Steps count? : 51, Cost: 1383.0
For episode num 1422  Steps count? : 44, Cost: 1384.0
Warning: trajectory cut off when rollout by epoch at 30.0 steps.
Processing rollout for epoch: 28... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.01285119354724884 Actual: 0.012315032072365284
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.8675713539123535    │
│ Metrics/EpCost                │ 1.0                    │
│ Metrics/EpLen                 │ 40.040000915527344     │
│ Train/Epoch                   │ 28.0                   │
│ Train/Entropy                 │ 0.9027681350708008     │
│ Train/KL                      │ 0.0002745090750977397  │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.0009769201278687     │
│ Train/PolicyRatio/Min         │ 1.0009769201278687     │
│ Train/PolicyRatio/Max         │ 1.0009769201278687     │
│ Train/PolicyRatio/Std         │ 0.0006908149225637317  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.5968616604804993     │
│ TotalEnvSteps                 │ 58000.0                │
│ Loss/Loss_pi                  │ -0.009248027577996254  │
│ Loss/Loss_pi/Delta            │ -0.002252838108688593  │
│ Value/Adv                     │ 2.8610229740877458e-09 │
│ Loss/Loss_reward_critic       │ 0.050450362265110016   │
│ Loss/Loss_reward_critic/Delta │ -0.006458129733800888  │
│ Value/reward                  │ -3.3066420555114746    │
│ Loss/Loss_cost_critic         │ 0.003955210093408823   │
│ Loss/Loss_cost_critic/Delta   │ -0.0003861752338707447 │
│ Value/cost                    │ 0.8191108107566833     │
│ Time/Total                    │ 77.8603515625          │
│ Time/Rollout                  │ 1.6399800777435303     │
│ Time/Update                   │ 0.867211103439331      │
│ Time/Epoch                    │ 2.5072083473205566     │
│ Time/FPS                      │ 797.7003173828125      │
│ Misc/Alpha                    │ 1.5564215183258057     │
│ Misc/FinalStepNorm            │ 0.1777360439300537     │
│ Misc/gradient_norm            │ 0.38506215810775757    │
│ Misc/xHx                      │ 0.008256101049482822   │
│ Misc/H_inv_g                  │ 0.11419530212879181    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.018269173800945282   │
│ Misc/A                        │ 0.008213793858885765   │
│ Misc/B                        │ -46120824.0            │
│ Misc/q                        │ 0.008256101049482822   │
│ Misc/r                        │ 2.298639992659446e-05  │
│ Misc/s                        │ 1.2478935786930379e-05 │
│ Misc/Lambda_star              │ 0.6424994468688965     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 1423  Steps count? : 30, Cost: 1384.0
For episode num 1424  Steps count? : 49, Cost: 1385.0
For episode num 1425  Steps count? : 35, Cost: 1386.0
For episode num 1426  Steps count? : 33, Cost: 1387.0
For episode num 1427  Steps count? : 43, Cost: 1388.0
For episode num 1428  Steps count? : 53, Cost: 1389.0
For episode num 1429  Steps count? : 41, Cost: 1390.0
For episode num 1430  Steps count? : 36, Cost: 1391.0
For episode num 1431  Steps count? : 54, Cost: 1392.0
For episode num 1432  Steps count? : 48, Cost: 1393.0
For episode num 1433  Steps count? : 35, Cost: 1394.0
For episode num 1434  Steps count? : 38, Cost: 1395.0
For episode num 1435  Steps count? : 37, Cost: 1396.0
For episode num 1436  Steps count? : 52, Cost: 1397.0
For episode num 1437  Steps count? : 46, Cost: 1398.0
For episode num 1438  Steps count? : 43, Cost: 1399.0
For episode num 1439  Steps count? : 34, Cost: 1400.0
For episode num 1440  Steps count? : 34, Cost: 1401.0
For episode num 1441  Steps count? : 45, Cost: 1402.0
For episode num 1442  Steps count? : 38, Cost: 1403.0
For episode num 1443  Steps count? : 35, Cost: 1404.0
For episode num 1444  Steps count? : 51, Cost: 1405.0
For episode num 1445  Steps count? : 38, Cost: 1406.0
For episode num 1446  Steps count? : 51, Cost: 1407.0
For episode num 1447  Steps count? : 38, Cost: 1408.0
For episode num 1448  Steps count? : 44, Cost: 1409.0
For episode num 1449  Steps count? : 41, Cost: 1410.0
For episode num 1450  Steps count? : 45, Cost: 1411.0
For episode num 1451  Steps count? : 44, Cost: 1412.0
For episode num 1452  Steps count? : 35, Cost: 1413.0
For episode num 1453  Steps count? : 49, Cost: 1414.0
For episode num 1454  Steps count? : 35, Cost: 1415.0
For episode num 1455  Steps count? : 47, Cost: 1416.0
For episode num 1456  Steps count? : 64, Cost: 1417.0
For episode num 1457  Steps count? : 36, Cost: 1418.0
For episode num 1458  Steps count? : 46, Cost: 1419.0
For episode num 1459  Steps count? : 55, Cost: 1420.0
For episode num 1460  Steps count? : 38, Cost: 1421.0
For episode num 1461  Steps count? : 49, Cost: 1422.0
For episode num 1462  Steps count? : 54, Cost: 1423.0
For episode num 1463  Steps count? : 38, Cost: 1424.0
For episode num 1464  Steps count? : 43, Cost: 1425.0
For episode num 1465  Steps count? : 37, Cost: 1426.0
For episode num 1466  Steps count? : 38, Cost: 1427.0
For episode num 1467  Steps count? : 37, Cost: 1428.0
For episode num 1468  Steps count? : 54, Cost: 1429.0
For episode num 1469  Steps count? : 38, Cost: 1430.0
Warning: trajectory cut off when rollout by epoch at 26.0 steps.
Processing rollout for epoch: 29... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.007689851801842451 Actual: 0.0077031198889017105
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.914858341217041     │
│ Metrics/EpCost                │ 1.0                    │
│ Metrics/EpLen                 │ 43.13999938964844      │
│ Train/Epoch                   │ 29.0                   │
│ Train/Entropy                 │ 0.8802080154418945     │
│ Train/KL                      │ 0.00014583655865862966 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.0028492212295532     │
│ Train/PolicyRatio/Min         │ 1.0028492212295532     │
│ Train/PolicyRatio/Max         │ 1.0028492212295532     │
│ Train/PolicyRatio/Std         │ 0.0020147317554801702  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.5835115909576416     │
│ TotalEnvSteps                 │ 60000.0                │
│ Loss/Loss_pi                  │ -0.005787180736660957  │
│ Loss/Loss_pi/Delta            │ 0.0034608468413352966  │
│ Value/Adv                     │ 5.2452087118126656e-09 │
│ Loss/Loss_reward_critic       │ 0.051270850002765656   │
│ Loss/Loss_reward_critic/Delta │ 0.0008204877376556396  │
│ Value/reward                  │ -3.3688666820526123    │
│ Loss/Loss_cost_critic         │ 0.0041670361533761024  │
│ Loss/Loss_cost_critic/Delta   │ 0.00021182605996727943 │
│ Value/cost                    │ 0.8135458827018738     │
│ Time/Total                    │ 80.35113525390625      │
│ Time/Rollout                  │ 1.6077680587768555     │
│ Time/Update                   │ 0.8650870323181152     │
│ Time/Epoch                    │ 2.4728710651397705     │
│ Time/FPS                      │ 808.7767333984375      │
│ Misc/Alpha                    │ 2.6082653999328613     │
│ Misc/FinalStepNorm            │ 0.33009693026542664    │
│ Misc/gradient_norm            │ 0.043264638632535934   │
│ Misc/xHx                      │ 0.002939848694950342   │
│ Misc/H_inv_g                  │ 0.12655800580978394    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.019903387874364853   │
│ Misc/A                        │ 0.0029346405062824488  │
│ Misc/B                        │ -34846240.0            │
│ Misc/q                        │ 0.002939848694950342   │
│ Misc/r                        │ -9.278453944716603e-06 │
│ Misc/s                        │ 1.651975981076248e-05  │
│ Misc/Lambda_star              │ 0.38339656591415405    │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 1470  Steps count? : 26, Cost: 1430.0
For episode num 1471  Steps count? : 55, Cost: 1431.0
For episode num 1472  Steps count? : 40, Cost: 1432.0
For episode num 1473  Steps count? : 35, Cost: 1433.0
For episode num 1474  Steps count? : 42, Cost: 1434.0
For episode num 1475  Steps count? : 40, Cost: 1435.0
For episode num 1476  Steps count? : 38, Cost: 1436.0
For episode num 1477  Steps count? : 47, Cost: 1437.0
For episode num 1478  Steps count? : 49, Cost: 1438.0
For episode num 1479  Steps count? : 40, Cost: 1439.0
For episode num 1480  Steps count? : 38, Cost: 1440.0
For episode num 1481  Steps count? : 45, Cost: 1441.0
For episode num 1482  Steps count? : 65, Cost: 1442.0
For episode num 1483  Steps count? : 66, Cost: 1443.0
For episode num 1484  Steps count? : 44, Cost: 1444.0
For episode num 1485  Steps count? : 56, Cost: 1445.0
For episode num 1486  Steps count? : 51, Cost: 1446.0
For episode num 1487  Steps count? : 54, Cost: 1447.0
For episode num 1488  Steps count? : 66, Cost: 1448.0
For episode num 1489  Steps count? : 33, Cost: 1449.0
For episode num 1490  Steps count? : 37, Cost: 1450.0
For episode num 1491  Steps count? : 43, Cost: 1451.0
For episode num 1492  Steps count? : 36, Cost: 1452.0
For episode num 1493  Steps count? : 45, Cost: 1453.0
For episode num 1494  Steps count? : 64, Cost: 1454.0
For episode num 1495  Steps count? : 45, Cost: 1455.0
For episode num 1496  Steps count? : 44, Cost: 1456.0
For episode num 1497  Steps count? : 45, Cost: 1457.0
For episode num 1498  Steps count? : 30, Cost: 1458.0
For episode num 1499  Steps count? : 46, Cost: 1459.0
For episode num 1500  Steps count? : 45, Cost: 1460.0
For episode num 1501  Steps count? : 41, Cost: 1461.0
For episode num 1502  Steps count? : 42, Cost: 1462.0
For episode num 1503  Steps count? : 66, Cost: 1463.0
For episode num 1504  Steps count? : 46, Cost: 1464.0
For episode num 1505  Steps count? : 45, Cost: 1465.0
For episode num 1506  Steps count? : 37, Cost: 1466.0
For episode num 1507  Steps count? : 31, Cost: 1467.0
For episode num 1508  Steps count? : 39, Cost: 1468.0
For episode num 1509  Steps count? : 40, Cost: 1469.0
For episode num 1510  Steps count? : 47, Cost: 1470.0
For episode num 1511  Steps count? : 60, Cost: 1471.0
For episode num 1512  Steps count? : 53, Cost: 1472.0
Warning: trajectory cut off when rollout by epoch at 69.0 steps.
Processing rollout for epoch: 30... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.007689783815294504 Actual: 0.00788348913192749
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -5.009312152862549      │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 45.400001525878906      │
│ Train/Epoch                   │ 30.0                    │
│ Train/Entropy                 │ 0.8511901497840881      │
│ Train/KL                      │ 0.00022844482737127692  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.002860426902771       │
│ Train/PolicyRatio/Min         │ 1.002860426902771       │
│ Train/PolicyRatio/Max         │ 1.002860426902771       │
│ Train/PolicyRatio/Std         │ 0.002022655215114355    │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.5668739080429077      │
│ TotalEnvSteps                 │ 62000.0                 │
│ Loss/Loss_pi                  │ -0.006025300361216068   │
│ Loss/Loss_pi/Delta            │ -0.00023811962455511093 │
│ Value/Adv                     │ 1.2397766369076635e-08  │
│ Loss/Loss_reward_critic       │ 0.05676900967955589     │
│ Loss/Loss_reward_critic/Delta │ 0.005498159676790237    │
│ Value/reward                  │ -3.371760606765747      │
│ Loss/Loss_cost_critic         │ 0.004844169598072767    │
│ Loss/Loss_cost_critic/Delta   │ 0.0006771334446966648   │
│ Value/cost                    │ 0.7905964851379395      │
│ Time/Total                    │ 82.83199310302734       │
│ Time/Rollout                  │ 1.5985660552978516      │
│ Time/Update                   │ 0.8632988929748535      │
│ Time/Epoch                    │ 2.461881637573242       │
│ Time/FPS                      │ 812.3869018554688       │
│ Misc/Alpha                    │ 2.6081087589263916      │
│ Misc/FinalStepNorm            │ 0.2511734366416931      │
│ Misc/gradient_norm            │ 0.18713365495204926     │
│ Misc/xHx                      │ 0.002940201433375478    │
│ Misc/H_inv_g                  │ 0.09630479663610458     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.02582949958741665     │
│ Misc/A                        │ 0.0024572378024458885   │
│ Misc/B                        │ -13334149.0             │
│ Misc/q                        │ 0.002940201433375478    │
│ Misc/r                        │ -0.0001444394583813846  │
│ Misc/s                        │ 4.31873559136875e-05    │
│ Misc/Lambda_star              │ 0.38341960310935974     │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 1513  Steps count? : 69, Cost: 1472.0
For episode num 1514  Steps count? : 39, Cost: 1473.0
For episode num 1515  Steps count? : 46, Cost: 1474.0
For episode num 1516  Steps count? : 43, Cost: 1475.0
For episode num 1517  Steps count? : 60, Cost: 1476.0
For episode num 1518  Steps count? : 51, Cost: 1477.0
For episode num 1519  Steps count? : 34, Cost: 1478.0
For episode num 1520  Steps count? : 41, Cost: 1479.0
For episode num 1521  Steps count? : 65, Cost: 1480.0
For episode num 1522  Steps count? : 42, Cost: 1481.0
For episode num 1523  Steps count? : 64, Cost: 1482.0
For episode num 1524  Steps count? : 55, Cost: 1483.0
For episode num 1525  Steps count? : 72, Cost: 1484.0
For episode num 1526  Steps count? : 43, Cost: 1485.0
For episode num 1527  Steps count? : 67, Cost: 1486.0
For episode num 1528  Steps count? : 53, Cost: 1487.0
For episode num 1529  Steps count? : 44, Cost: 1488.0
For episode num 1530  Steps count? : 43, Cost: 1489.0
For episode num 1531  Steps count? : 45, Cost: 1490.0
For episode num 1532  Steps count? : 51, Cost: 1491.0
For episode num 1533  Steps count? : 53, Cost: 1492.0
For episode num 1534  Steps count? : 36, Cost: 1493.0
For episode num 1535  Steps count? : 46, Cost: 1494.0
For episode num 1536  Steps count? : 32, Cost: 1495.0
For episode num 1537  Steps count? : 71, Cost: 1496.0
For episode num 1538  Steps count? : 61, Cost: 1497.0
For episode num 1539  Steps count? : 36, Cost: 1498.0
For episode num 1540  Steps count? : 52, Cost: 1499.0
For episode num 1541  Steps count? : 42, Cost: 1500.0
For episode num 1542  Steps count? : 37, Cost: 1501.0
For episode num 1543  Steps count? : 52, Cost: 1502.0
For episode num 1544  Steps count? : 39, Cost: 1503.0
For episode num 1545  Steps count? : 53, Cost: 1504.0
For episode num 1546  Steps count? : 48, Cost: 1505.0
For episode num 1547  Steps count? : 45, Cost: 1506.0
For episode num 1548  Steps count? : 44, Cost: 1507.0
For episode num 1549  Steps count? : 36, Cost: 1508.0
For episode num 1550  Steps count? : 82, Cost: 1509.0
For episode num 1551  Steps count? : 76, Cost: 1510.0
For episode num 1552  Steps count? : 44, Cost: 1511.0
For episode num 1553  Steps count? : 57, Cost: 1512.0
Processing rollout for epoch: 31... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.008905565366148949 Actual: 0.009006665088236332
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -5.10671329498291      │
│ Metrics/EpCost                │ 1.0                    │
│ Metrics/EpLen                 │ 49.279998779296875     │
│ Train/Epoch                   │ 31.0                   │
│ Train/Entropy                 │ 0.8286960124969482     │
│ Train/KL                      │ 0.00018937453569378704 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.0016765594482422     │
│ Train/PolicyRatio/Min         │ 1.0016765594482422     │
│ Train/PolicyRatio/Max         │ 1.0016765594482422     │
│ Train/PolicyRatio/Std         │ 0.0011855626944452524  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.5542100667953491     │
│ TotalEnvSteps                 │ 64000.0                │
│ Loss/Loss_pi                  │ -0.00676327757537365   │
│ Loss/Loss_pi/Delta            │ -0.0007379772141575813 │
│ Value/Adv                     │ -9.298324776807476e-09 │
│ Loss/Loss_reward_critic       │ 0.062084175646305084   │
│ Loss/Loss_reward_critic/Delta │ 0.005315165966749191   │
│ Value/reward                  │ -3.3660168647766113    │
│ Loss/Loss_cost_critic         │ 0.005128758959472179   │
│ Loss/Loss_cost_critic/Delta   │ 0.00028458936139941216 │
│ Value/cost                    │ 0.7843934297561646     │
│ Time/Total                    │ 85.31327819824219      │
│ Time/Rollout                  │ 1.5965683460235596     │
│ Time/Update                   │ 0.8668084144592285     │
│ Time/Epoch                    │ 2.463399887084961      │
│ Time/FPS                      │ 811.8869018554688      │
│ Misc/Alpha                    │ 2.2587971687316895     │
│ Misc/FinalStepNorm            │ 0.30193254351615906    │
│ Misc/gradient_norm            │ 0.18294206261634827    │
│ Misc/xHx                      │ 0.003919895272701979   │
│ Misc/H_inv_g                  │ 0.13366959989070892    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.01837228797376156    │
│ Misc/A                        │ 0.003890982596203685   │
│ Misc/B                        │ -39132372.0            │
│ Misc/q                        │ 0.003919895272701979   │
│ Misc/r                        │ 2.0629408027161844e-05 │
│ Misc/s                        │ 1.4709270544699393e-05 │
│ Misc/Lambda_star              │ 0.44271349906921387    │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 1554  Steps count? : 0, Cost: 1512.0
For episode num 1555  Steps count? : 47, Cost: 1513.0
For episode num 1556  Steps count? : 41, Cost: 1514.0
For episode num 1557  Steps count? : 36, Cost: 1515.0
For episode num 1558  Steps count? : 41, Cost: 1516.0
For episode num 1559  Steps count? : 32, Cost: 1517.0
For episode num 1560  Steps count? : 34, Cost: 1518.0
For episode num 1561  Steps count? : 39, Cost: 1519.0
For episode num 1562  Steps count? : 43, Cost: 1520.0
For episode num 1563  Steps count? : 29, Cost: 1521.0
For episode num 1564  Steps count? : 35, Cost: 1522.0
For episode num 1565  Steps count? : 54, Cost: 1523.0
For episode num 1566  Steps count? : 40, Cost: 1524.0
For episode num 1567  Steps count? : 39, Cost: 1525.0
For episode num 1568  Steps count? : 36, Cost: 1526.0
For episode num 1569  Steps count? : 67, Cost: 1527.0
For episode num 1570  Steps count? : 56, Cost: 1528.0
For episode num 1571  Steps count? : 42, Cost: 1529.0
For episode num 1572  Steps count? : 54, Cost: 1530.0
For episode num 1573  Steps count? : 41, Cost: 1531.0
For episode num 1574  Steps count? : 43, Cost: 1532.0
For episode num 1575  Steps count? : 47, Cost: 1533.0
For episode num 1576  Steps count? : 59, Cost: 1534.0
For episode num 1577  Steps count? : 38, Cost: 1535.0
For episode num 1578  Steps count? : 59, Cost: 1536.0
For episode num 1579  Steps count? : 53, Cost: 1537.0
For episode num 1580  Steps count? : 46, Cost: 1538.0
For episode num 1581  Steps count? : 50, Cost: 1539.0
For episode num 1582  Steps count? : 37, Cost: 1540.0
For episode num 1583  Steps count? : 34, Cost: 1541.0
For episode num 1584  Steps count? : 44, Cost: 1542.0
For episode num 1585  Steps count? : 60, Cost: 1543.0
For episode num 1586  Steps count? : 42, Cost: 1544.0
For episode num 1587  Steps count? : 48, Cost: 1545.0
For episode num 1588  Steps count? : 41, Cost: 1546.0
For episode num 1589  Steps count? : 53, Cost: 1547.0
For episode num 1590  Steps count? : 45, Cost: 1548.0
For episode num 1591  Steps count? : 45, Cost: 1549.0
For episode num 1592  Steps count? : 44, Cost: 1550.0
For episode num 1593  Steps count? : 44, Cost: 1551.0
For episode num 1594  Steps count? : 71, Cost: 1552.0
For episode num 1595  Steps count? : 51, Cost: 1553.0
For episode num 1596  Steps count? : 56, Cost: 1554.0
For episode num 1597  Steps count? : 46, Cost: 1555.0
Warning: trajectory cut off when rollout by epoch at 38.0 steps.
Processing rollout for epoch: 32... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.013532524928450584 Actual: 0.013201801106333733
INFO: violated KL constraint 0.011566831730306149 at step 1.
Expected Improvement: 0.013532524928450584 Actual: 0.010703973472118378
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -5.007101535797119      │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 46.91999816894531       │
│ Train/Epoch                   │ 32.0                    │
│ Train/Entropy                 │ 0.8006930351257324      │
│ Train/KL                      │ 0.00020296174625400454  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9961467385292053      │
│ Train/PolicyRatio/Min         │ 0.9961467385292053      │
│ Train/PolicyRatio/Max         │ 0.9961467385292053      │
│ Train/PolicyRatio/Std         │ 0.0023030810989439487   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.5389366745948792      │
│ TotalEnvSteps                 │ 66000.0                 │
│ Loss/Loss_pi                  │ -0.00917046982795       │
│ Loss/Loss_pi/Delta            │ -0.002407192252576351   │
│ Value/Adv                     │ -5.7220459481754915e-09 │
│ Loss/Loss_reward_critic       │ 0.05533915013074875     │
│ Loss/Loss_reward_critic/Delta │ -0.0067450255155563354  │
│ Value/reward                  │ -3.2732367515563965     │
│ Loss/Loss_cost_critic         │ 0.004559576977044344    │
│ Loss/Loss_cost_critic/Delta   │ -0.0005691819824278355  │
│ Value/cost                    │ 0.7915629148483276      │
│ Time/Total                    │ 87.80597686767578       │
│ Time/Rollout                  │ 1.6235589981079102      │
│ Time/Update                   │ 0.8508861064910889      │
│ Time/Epoch                    │ 2.474461793899536       │
│ Time/FPS                      │ 808.2567749023438       │
│ Misc/Alpha                    │ 1.4643666744232178      │
│ Misc/FinalStepNorm            │ 0.16838383674621582     │
│ Misc/gradient_norm            │ 0.38296106457710266     │
│ Misc/xHx                      │ 0.00932674016803503     │
│ Misc/H_inv_g                  │ 0.14373435080051422     │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 0.0264281015843153      │
│ Misc/A                        │ 0.004789023660123348    │
│ Misc/B                        │ -17282002.0             │
│ Misc/q                        │ 0.00932674016803503     │
│ Misc/r                        │ -0.00038889548159204423 │
│ Misc/s                        │ 3.331946936668828e-05   │
│ Misc/Lambda_star              │ 0.6828890442848206      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 1598  Steps count? : 38, Cost: 1555.0
For episode num 1599  Steps count? : 65, Cost: 1556.0
For episode num 1600  Steps count? : 46, Cost: 1557.0
For episode num 1601  Steps count? : 67, Cost: 1558.0
For episode num 1602  Steps count? : 36, Cost: 1559.0
For episode num 1603  Steps count? : 37, Cost: 1560.0
For episode num 1604  Steps count? : 51, Cost: 1561.0
For episode num 1605  Steps count? : 52, Cost: 1562.0
For episode num 1606  Steps count? : 88, Cost: 1563.0
For episode num 1607  Steps count? : 48, Cost: 1564.0
For episode num 1608  Steps count? : 46, Cost: 1565.0
For episode num 1609  Steps count? : 62, Cost: 1566.0
For episode num 1610  Steps count? : 85, Cost: 1567.0
For episode num 1611  Steps count? : 68, Cost: 1568.0
For episode num 1612  Steps count? : 41, Cost: 1569.0
For episode num 1613  Steps count? : 38, Cost: 1570.0
For episode num 1614  Steps count? : 40, Cost: 1571.0
For episode num 1615  Steps count? : 50, Cost: 1572.0
For episode num 1616  Steps count? : 54, Cost: 1573.0
For episode num 1617  Steps count? : 37, Cost: 1574.0
For episode num 1618  Steps count? : 41, Cost: 1575.0
For episode num 1619  Steps count? : 55, Cost: 1576.0
For episode num 1620  Steps count? : 56, Cost: 1577.0
For episode num 1621  Steps count? : 43, Cost: 1578.0
For episode num 1622  Steps count? : 71, Cost: 1579.0
For episode num 1623  Steps count? : 73, Cost: 1580.0
For episode num 1624  Steps count? : 41, Cost: 1581.0
For episode num 1625  Steps count? : 83, Cost: 1582.0
For episode num 1626  Steps count? : 49, Cost: 1583.0
For episode num 1627  Steps count? : 44, Cost: 1584.0
For episode num 1628  Steps count? : 52, Cost: 1585.0
For episode num 1629  Steps count? : 59, Cost: 1586.0
For episode num 1630  Steps count? : 43, Cost: 1587.0
For episode num 1631  Steps count? : 75, Cost: 1588.0
For episode num 1632  Steps count? : 42, Cost: 1589.0
For episode num 1633  Steps count? : 47, Cost: 1590.0
For episode num 1634  Steps count? : 46, Cost: 1591.0
Warning: trajectory cut off when rollout by epoch at 69.0 steps.
Processing rollout for epoch: 33... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.009960620664060116 Actual: 0.010657483711838722
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -5.162189483642578      │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 52.41999816894531       │
│ Train/Epoch                   │ 33.0                    │
│ Train/Entropy                 │ 0.7816314697265625      │
│ Train/KL                      │ 0.00018003313743975013  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0007638931274414      │
│ Train/PolicyRatio/Min         │ 1.0007638931274414      │
│ Train/PolicyRatio/Max         │ 1.0007638931274414      │
│ Train/PolicyRatio/Std         │ 0.0005402102251537144   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.528739869594574       │
│ TotalEnvSteps                 │ 68000.0                 │
│ Loss/Loss_pi                  │ -0.008131938055157661   │
│ Loss/Loss_pi/Delta            │ 0.0010385317727923393   │
│ Value/Adv                     │ 1.6689301052252858e-09  │
│ Loss/Loss_reward_critic       │ 0.06687761098146439     │
│ Loss/Loss_reward_critic/Delta │ 0.011538460850715637    │
│ Value/reward                  │ -3.393537998199463      │
│ Loss/Loss_cost_critic         │ 0.005418796557933092    │
│ Loss/Loss_cost_critic/Delta   │ 0.0008592195808887482   │
│ Value/cost                    │ 0.7710952758789062      │
│ Time/Total                    │ 90.35797882080078       │
│ Time/Rollout                  │ 1.5981042385101318      │
│ Time/Update                   │ 0.9363222122192383      │
│ Time/Epoch                    │ 2.5344479084014893      │
│ Time/FPS                      │ 789.1268920898438       │
│ Misc/Alpha                    │ 2.0121734142303467      │
│ Misc/FinalStepNorm            │ 0.29974377155303955     │
│ Misc/gradient_norm            │ 0.1516164392232895      │
│ Misc/xHx                      │ 0.004939674399793148    │
│ Misc/H_inv_g                  │ 0.14896519482135773     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.022269027307629585    │
│ Misc/A                        │ 0.003707593772560358    │
│ Misc/B                        │ -11402698.0             │
│ Misc/q                        │ 0.004939674399793148    │
│ Misc/r                        │ -0.00024947497877292335 │
│ Misc/s                        │ 5.050435720477253e-05   │
│ Misc/Lambda_star              │ 0.4969750642776489      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 1635  Steps count? : 69, Cost: 1591.0
For episode num 1636  Steps count? : 69, Cost: 1592.0
For episode num 1637  Steps count? : 53, Cost: 1593.0
For episode num 1638  Steps count? : 57, Cost: 1594.0
For episode num 1639  Steps count? : 47, Cost: 1595.0
For episode num 1640  Steps count? : 75, Cost: 1596.0
For episode num 1641  Steps count? : 51, Cost: 1597.0
For episode num 1642  Steps count? : 52, Cost: 1598.0
For episode num 1643  Steps count? : 72, Cost: 1599.0
For episode num 1644  Steps count? : 47, Cost: 1600.0
For episode num 1645  Steps count? : 49, Cost: 1601.0
For episode num 1646  Steps count? : 65, Cost: 1602.0
For episode num 1647  Steps count? : 60, Cost: 1603.0
For episode num 1648  Steps count? : 69, Cost: 1604.0
For episode num 1649  Steps count? : 49, Cost: 1605.0
For episode num 1650  Steps count? : 66, Cost: 1606.0
For episode num 1651  Steps count? : 40, Cost: 1607.0
For episode num 1652  Steps count? : 76, Cost: 1608.0
For episode num 1653  Steps count? : 64, Cost: 1609.0
For episode num 1654  Steps count? : 42, Cost: 1610.0
For episode num 1655  Steps count? : 65, Cost: 1611.0
For episode num 1656  Steps count? : 39, Cost: 1612.0
For episode num 1657  Steps count? : 60, Cost: 1613.0
For episode num 1658  Steps count? : 56, Cost: 1614.0
For episode num 1659  Steps count? : 70, Cost: 1615.0
For episode num 1660  Steps count? : 53, Cost: 1616.0
For episode num 1661  Steps count? : 58, Cost: 1617.0
For episode num 1662  Steps count? : 60, Cost: 1618.0
For episode num 1663  Steps count? : 46, Cost: 1619.0
For episode num 1664  Steps count? : 64, Cost: 1620.0
For episode num 1665  Steps count? : 47, Cost: 1621.0
For episode num 1666  Steps count? : 79, Cost: 1622.0
For episode num 1667  Steps count? : 72, Cost: 1623.0
For episode num 1668  Steps count? : 55, Cost: 1624.0
For episode num 1669  Steps count? : 58, Cost: 1625.0
Warning: trajectory cut off when rollout by epoch at 15.0 steps.
Processing rollout for epoch: 34... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.01007669698446989 Actual: 0.010626683011651039
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -5.318172454833984      │
│ Metrics/EpCost                │ 1.0                     │
│ Metrics/EpLen                 │ 57.279998779296875      │
│ Train/Epoch                   │ 34.0                    │
│ Train/Entropy                 │ 0.7806956171989441      │
│ Train/KL                      │ 0.00017332752759102732  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9994160532951355      │
│ Train/PolicyRatio/Min         │ 0.9994160532951355      │
│ Train/PolicyRatio/Max         │ 0.9994160532951355      │
│ Train/PolicyRatio/Std         │ 0.00041292671812698245  │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.528224527835846       │
│ TotalEnvSteps                 │ 70000.0                 │
│ Loss/Loss_pi                  │ -0.008089585229754448   │
│ Loss/Loss_pi/Delta            │ 4.23528254032135e-05    │
│ Value/Adv                     │ 3.6954879156780862e-09  │
│ Loss/Loss_reward_critic       │ 0.0667378306388855      │
│ Loss/Loss_reward_critic/Delta │ -0.00013978034257888794 │
│ Value/reward                  │ -3.424516201019287      │
│ Loss/Loss_cost_critic         │ 0.006158455274999142    │
│ Loss/Loss_cost_critic/Delta   │ 0.0007396587170660496   │
│ Value/cost                    │ 0.7491795420646667      │
│ Time/Total                    │ 93.99728393554688       │
│ Time/Rollout                  │ 2.605239152908325       │
│ Time/Update                   │ 1.0111265182495117      │
│ Time/Epoch                    │ 3.6163854598999023      │
│ Time/FPS                      │ 553.0385131835938       │
│ Misc/Alpha                    │ 1.990894079208374       │
│ Misc/FinalStepNorm            │ 0.3328118622303009      │
│ Misc/gradient_norm            │ 0.09311239421367645     │
│ Misc/xHx                      │ 0.005045833066105843    │
│ Misc/H_inv_g                  │ 0.1671670377254486      │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.025811776518821716    │
│ Misc/A                        │ 0.0047976179048419      │
│ Misc/B                        │ -20317626.0             │
│ Misc/q                        │ 0.005045833066105843    │
│ Misc/r                        │ -8.388592686969787e-05  │
│ Misc/s                        │ 2.8339767595753074e-05  │
│ Misc/Lambda_star              │ 0.5022869110107422      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 1670  Steps count? : 15, Cost: 1625.0
For episode num 1671  Steps count? : 65, Cost: 1626.0
For episode num 1672  Steps count? : 55, Cost: 1627.0
For episode num 1673  Steps count? : 97, Cost: 1628.0
For episode num 1674  Steps count? : 61, Cost: 1629.0
For episode num 1675  Steps count? : 71, Cost: 1630.0
For episode num 1676  Steps count? : 63, Cost: 1631.0
For episode num 1677  Steps count? : 48, Cost: 1632.0
For episode num 1678  Steps count? : 50, Cost: 1633.0
For episode num 1679  Steps count? : 90, Cost: 1634.0
For episode num 1680  Steps count? : 100, Cost: 1634.0
For episode num 1681  Steps count? : 47, Cost: 1635.0
For episode num 1682  Steps count? : 47, Cost: 1636.0
For episode num 1683  Steps count? : 62, Cost: 1637.0
For episode num 1684  Steps count? : 59, Cost: 1638.0
For episode num 1685  Steps count? : 67, Cost: 1639.0
For episode num 1686  Steps count? : 67, Cost: 1640.0
For episode num 1687  Steps count? : 60, Cost: 1641.0
For episode num 1688  Steps count? : 70, Cost: 1642.0
For episode num 1689  Steps count? : 45, Cost: 1643.0
For episode num 1690  Steps count? : 67, Cost: 1644.0
For episode num 1691  Steps count? : 66, Cost: 1645.0
For episode num 1692  Steps count? : 59, Cost: 1646.0
For episode num 1693  Steps count? : 58, Cost: 1647.0
For episode num 1694  Steps count? : 64, Cost: 1648.0
For episode num 1695  Steps count? : 73, Cost: 1649.0
For episode num 1696  Steps count? : 55, Cost: 1650.0
For episode num 1697  Steps count? : 64, Cost: 1651.0
For episode num 1698  Steps count? : 52, Cost: 1652.0
For episode num 1699  Steps count? : 53, Cost: 1653.0
For episode num 1700  Steps count? : 65, Cost: 1654.0
For episode num 1701  Steps count? : 55, Cost: 1655.0
Warning: trajectory cut off when rollout by epoch at 45.0 steps.
Processing rollout for epoch: 35... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.014719321392476559 Actual: 0.01579132489860058
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -5.385229587554932      │
│ Metrics/EpCost                │ 0.9800000190734863      │
│ Metrics/EpLen                 │ 61.18000030517578       │
│ Train/Epoch                   │ 35.0                    │
│ Train/Entropy                 │ 0.7721936702728271      │
│ Train/KL                      │ 0.0002065910812234506   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9988719820976257      │
│ Train/PolicyRatio/Min         │ 0.9988719820976257      │
│ Train/PolicyRatio/Max         │ 0.9988719820976257      │
│ Train/PolicyRatio/Std         │ 0.000797671265900135    │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.5237653255462646      │
│ TotalEnvSteps                 │ 72000.0                 │
│ Loss/Loss_pi                  │ -0.011952877044677734   │
│ Loss/Loss_pi/Delta            │ -0.0038632918149232864  │
│ Value/Adv                     │ 4.529952857268427e-09   │
│ Loss/Loss_reward_critic       │ 0.07128305733203888     │
│ Loss/Loss_reward_critic/Delta │ 0.004545226693153381    │
│ Value/reward                  │ -3.4396893978118896     │
│ Loss/Loss_cost_critic         │ 0.006645422428846359    │
│ Loss/Loss_cost_critic/Delta   │ 0.00048696715384721756  │
│ Value/cost                    │ 0.7320941686630249      │
│ Time/Total                    │ 96.5785903930664        │
│ Time/Rollout                  │ 1.6060521602630615      │
│ Time/Update                   │ 0.9576129913330078      │
│ Time/Epoch                    │ 2.5636813640594482      │
│ Time/FPS                      │ 780.1283569335938       │
│ Misc/Alpha                    │ 1.3566675186157227      │
│ Misc/FinalStepNorm            │ 0.2787820100784302      │
│ Misc/gradient_norm            │ 0.2735990881919861      │
│ Misc/xHx                      │ 0.010866325348615646    │
│ Misc/H_inv_g                  │ 0.20549027621746063     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.020519116893410683    │
│ Misc/A                        │ 0.01054154708981514     │
│ Misc/B                        │ -11504992.0             │
│ Misc/q                        │ 0.010866325348615646    │
│ Misc/r                        │ -0.00012762134429067373 │
│ Misc/s                        │ 5.013869667891413e-05   │
│ Misc/Lambda_star              │ 0.7371003031730652      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 1702  Steps count? : 45, Cost: 1655.0
For episode num 1703  Steps count? : 76, Cost: 1656.0
For episode num 1704  Steps count? : 100, Cost: 1656.0
For episode num 1705  Steps count? : 58, Cost: 1657.0
For episode num 1706  Steps count? : 93, Cost: 1658.0
For episode num 1707  Steps count? : 48, Cost: 1659.0
For episode num 1708  Steps count? : 100, Cost: 1659.0
For episode num 1709  Steps count? : 49, Cost: 1660.0
For episode num 1710  Steps count? : 83, Cost: 1661.0
For episode num 1711  Steps count? : 61, Cost: 1662.0
For episode num 1712  Steps count? : 80, Cost: 1663.0
For episode num 1713  Steps count? : 56, Cost: 1664.0
For episode num 1714  Steps count? : 100, Cost: 1664.0
For episode num 1715  Steps count? : 81, Cost: 1665.0
For episode num 1716  Steps count? : 58, Cost: 1666.0
For episode num 1717  Steps count? : 64, Cost: 1667.0
For episode num 1718  Steps count? : 57, Cost: 1668.0
For episode num 1719  Steps count? : 81, Cost: 1669.0
For episode num 1720  Steps count? : 56, Cost: 1670.0
For episode num 1721  Steps count? : 100, Cost: 1670.0
For episode num 1722  Steps count? : 100, Cost: 1670.0
For episode num 1723  Steps count? : 100, Cost: 1670.0
For episode num 1724  Steps count? : 88, Cost: 1671.0
For episode num 1725  Steps count? : 62, Cost: 1672.0
For episode num 1726  Steps count? : 79, Cost: 1673.0
For episode num 1727  Steps count? : 56, Cost: 1674.0
For episode num 1728  Steps count? : 79, Cost: 1675.0
Warning: trajectory cut off when rollout by epoch at 35.0 steps.
Processing rollout for epoch: 36... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.016106555238366127 Actual: 0.017448116093873978
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -5.033076286315918     │
│ Metrics/EpCost                │ 0.8600000143051147     │
│ Metrics/EpLen                 │ 69.19999694824219      │
│ Train/Epoch                   │ 36.0                   │
│ Train/Entropy                 │ 0.737679660320282      │
│ Train/KL                      │ 0.00024661922361701727 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.002568244934082      │
│ Train/PolicyRatio/Min         │ 1.002568244934082      │
│ Train/PolicyRatio/Max         │ 1.002568244934082      │
│ Train/PolicyRatio/Std         │ 0.0018160233739763498  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.5060847401618958     │
│ TotalEnvSteps                 │ 74000.0                │
│ Loss/Loss_pi                  │ -0.01330326683819294   │
│ Loss/Loss_pi/Delta            │ -0.0013503897935152054 │
│ Value/Adv                     │ 2.145767119543507e-09  │
│ Loss/Loss_reward_critic       │ 0.08129723370075226    │
│ Loss/Loss_reward_critic/Delta │ 0.010014176368713379   │
│ Value/reward                  │ -3.4550185203552246    │
│ Loss/Loss_cost_critic         │ 0.007155192084610462   │
│ Loss/Loss_cost_critic/Delta   │ 0.0005097696557641029  │
│ Value/cost                    │ 0.6948140859603882     │
│ Time/Total                    │ 99.93721008300781      │
│ Time/Rollout                  │ 2.172089099884033      │
│ Time/Update                   │ 1.1677370071411133     │
│ Time/Epoch                    │ 3.339848041534424      │
│ Time/FPS                      │ 598.829833984375       │
│ Misc/Alpha                    │ 1.2435879707336426     │
│ Misc/FinalStepNorm            │ 0.2618919312953949     │
│ Misc/gradient_norm            │ 0.2223973274230957     │
│ Misc/xHx                      │ 0.012932323850691319   │
│ Misc/H_inv_g                  │ 0.2105938345193863     │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.041710712015628815   │
│ Misc/A                        │ 0.00929319392889738    │
│ Misc/B                        │ -9603190.0             │
│ Misc/q                        │ 0.012932323850691319   │
│ Misc/r                        │ -0.0004699247074313462 │
│ Misc/s                        │ 6.0671874962281436e-05 │
│ Misc/Lambda_star              │ 0.8041248321533203     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 1729  Steps count? : 35, Cost: 1675.0
For episode num 1730  Steps count? : 98, Cost: 1676.0
For episode num 1731  Steps count? : 100, Cost: 1676.0
For episode num 1732  Steps count? : 100, Cost: 1676.0
For episode num 1733  Steps count? : 100, Cost: 1676.0
For episode num 1734  Steps count? : 68, Cost: 1677.0
For episode num 1735  Steps count? : 51, Cost: 1678.0
For episode num 1736  Steps count? : 100, Cost: 1678.0
For episode num 1737  Steps count? : 65, Cost: 1679.0
For episode num 1738  Steps count? : 77, Cost: 1680.0
For episode num 1739  Steps count? : 78, Cost: 1681.0
For episode num 1740  Steps count? : 100, Cost: 1681.0
For episode num 1741  Steps count? : 61, Cost: 1682.0
For episode num 1742  Steps count? : 75, Cost: 1683.0
For episode num 1743  Steps count? : 100, Cost: 1683.0
For episode num 1744  Steps count? : 89, Cost: 1684.0
For episode num 1745  Steps count? : 100, Cost: 1684.0
For episode num 1746  Steps count? : 87, Cost: 1685.0
For episode num 1747  Steps count? : 62, Cost: 1686.0
For episode num 1748  Steps count? : 100, Cost: 1687.0
For episode num 1749  Steps count? : 100, Cost: 1687.0
For episode num 1750  Steps count? : 70, Cost: 1688.0
For episode num 1751  Steps count? : 100, Cost: 1688.0
For episode num 1752  Steps count? : 78, Cost: 1689.0
Warning: trajectory cut off when rollout by epoch at 41.0 steps.
Processing rollout for epoch: 37... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.016242273151874542 Actual: 0.017794320359826088
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -4.64649772644043      │
│ Metrics/EpCost                │ 0.699999988079071      │
│ Metrics/EpLen                 │ 79.58000183105469      │
│ Train/Epoch                   │ 37.0                   │
│ Train/Entropy                 │ 0.7112438678741455     │
│ Train/KL                      │ 0.00020936813962180167 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.002498984336853      │
│ Train/PolicyRatio/Min         │ 1.002498984336853      │
│ Train/PolicyRatio/Max         │ 1.002498984336853      │
│ Train/PolicyRatio/Std         │ 0.0017670769011601806  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.4927968978881836     │
│ TotalEnvSteps                 │ 76000.0                │
│ Loss/Loss_pi                  │ -0.013529038056731224  │
│ Loss/Loss_pi/Delta            │ -0.0002257712185382843 │
│ Value/Adv                     │ -4.410743770222325e-09 │
│ Loss/Loss_reward_critic       │ 0.09374818205833435    │
│ Loss/Loss_reward_critic/Delta │ 0.012450948357582092   │
│ Value/reward                  │ -3.448295831680298     │
│ Loss/Loss_cost_critic         │ 0.007993125356733799   │
│ Loss/Loss_cost_critic/Delta   │ 0.0008379332721233368  │
│ Value/cost                    │ 0.6290439963340759     │
│ Time/Total                    │ 102.83036041259766     │
│ Time/Rollout                  │ 2.0221407413482666     │
│ Time/Update                   │ 0.8490426540374756     │
│ Time/Epoch                    │ 2.8711998462677        │
│ Time/FPS                      │ 696.5730590820312      │
│ Misc/Alpha                    │ 1.2323452234268188     │
│ Misc/FinalStepNorm            │ 0.2835913896560669     │
│ Misc/gradient_norm            │ 0.18970882892608643    │
│ Misc/xHx                      │ 0.013169365003705025   │
│ Misc/H_inv_g                  │ 0.23012332618236542    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.03593985363841057    │
│ Misc/A                        │ 0.01066262274980545    │
│ Misc/B                        │ -5822405.0             │
│ Misc/q                        │ 0.013169365003705025   │
│ Misc/r                        │ -0.0005042081465944648 │
│ Misc/s                        │ 0.00010140685481019318 │
│ Misc/Lambda_star              │ 0.8114609122276306     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 1753  Steps count? : 41, Cost: 1689.0
For episode num 1754  Steps count? : 100, Cost: 1689.0
For episode num 1755  Steps count? : 76, Cost: 1690.0
For episode num 1756  Steps count? : 100, Cost: 1690.0
For episode num 1757  Steps count? : 100, Cost: 1690.0
For episode num 1758  Steps count? : 100, Cost: 1690.0
For episode num 1759  Steps count? : 100, Cost: 1690.0
For episode num 1760  Steps count? : 76, Cost: 1691.0
For episode num 1761  Steps count? : 100, Cost: 1691.0
For episode num 1762  Steps count? : 100, Cost: 1691.0
For episode num 1763  Steps count? : 100, Cost: 1691.0
For episode num 1764  Steps count? : 100, Cost: 1691.0
For episode num 1765  Steps count? : 97, Cost: 1692.0
For episode num 1766  Steps count? : 100, Cost: 1692.0
For episode num 1767  Steps count? : 53, Cost: 1693.0
For episode num 1768  Steps count? : 100, Cost: 1693.0
For episode num 1769  Steps count? : 79, Cost: 1694.0
For episode num 1770  Steps count? : 100, Cost: 1694.0
For episode num 1771  Steps count? : 100, Cost: 1694.0
For episode num 1772  Steps count? : 100, Cost: 1694.0
For episode num 1773  Steps count? : 100, Cost: 1695.0
For episode num 1774  Steps count? : 100, Cost: 1695.0
Warning: trajectory cut off when rollout by epoch at 19.0 steps.
Processing rollout for epoch: 38... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.029508091509342194 Actual: 0.032659389078617096
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -3.9433562755584717    │
│ Metrics/EpCost                │ 0.5                    │
│ Metrics/EpLen                 │ 88.08000183105469      │
│ Train/Epoch                   │ 38.0                   │
│ Train/Entropy                 │ 0.6965689063072205     │
│ Train/KL                      │ 0.000286140653770417   │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 0.9991233944892883     │
│ Train/PolicyRatio/Min         │ 0.9991233944892883     │
│ Train/PolicyRatio/Max         │ 0.9991233944892883     │
│ Train/PolicyRatio/Std         │ 0.00061986775835976    │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.4856092631816864     │
│ TotalEnvSteps                 │ 78000.0                │
│ Loss/Loss_pi                  │ -0.024777572602033615  │
│ Loss/Loss_pi/Delta            │ -0.011248534545302391  │
│ Value/Adv                     │ -5.435943606357796e-08 │
│ Loss/Loss_reward_critic       │ 0.09715060889720917    │
│ Loss/Loss_reward_critic/Delta │ 0.003402426838874817   │
│ Value/reward                  │ -3.3850550651550293    │
│ Loss/Loss_cost_critic         │ 0.00742777343839407    │
│ Loss/Loss_cost_critic/Delta   │ -0.0005653519183397293 │
│ Value/cost                    │ 0.5539485216140747     │
│ Time/Total                    │ 105.3421401977539      │
│ Time/Rollout                  │ 1.5768918991088867     │
│ Time/Update                   │ 0.9172523021697998     │
│ Time/Epoch                    │ 2.4941630363464355     │
│ Time/FPS                      │ 801.87255859375        │
│ Misc/Alpha                    │ 0.6735156774520874     │
│ Misc/FinalStepNorm            │ 0.21221940219402313    │
│ Misc/gradient_norm            │ 1.007262110710144      │
│ Misc/xHx                      │ 0.0440894290804863     │
│ Misc/H_inv_g                  │ 0.3150920271873474     │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.03189896047115326    │
│ Misc/A                        │ 0.01731840707361698    │
│ Misc/B                        │ -7689861.5             │
│ Misc/q                        │ 0.0440894290804863     │
│ Misc/r                        │ -0.0014455706113949418 │
│ Misc/s                        │ 7.804732740623876e-05  │
│ Misc/Lambda_star              │ 1.4847464561462402     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 1775  Steps count? : 19, Cost: 1695.0
For episode num 1776  Steps count? : 100, Cost: 1695.0
For episode num 1777  Steps count? : 100, Cost: 1695.0
For episode num 1778  Steps count? : 100, Cost: 1695.0
For episode num 1779  Steps count? : 100, Cost: 1695.0
For episode num 1780  Steps count? : 100, Cost: 1695.0
For episode num 1781  Steps count? : 100, Cost: 1695.0
For episode num 1782  Steps count? : 100, Cost: 1695.0
For episode num 1783  Steps count? : 100, Cost: 1695.0
For episode num 1784  Steps count? : 100, Cost: 1695.0
For episode num 1785  Steps count? : 100, Cost: 1695.0
For episode num 1786  Steps count? : 100, Cost: 1695.0
For episode num 1787  Steps count? : 87, Cost: 1696.0
For episode num 1788  Steps count? : 100, Cost: 1696.0
For episode num 1789  Steps count? : 100, Cost: 1696.0
For episode num 1790  Steps count? : 72, Cost: 1697.0
For episode num 1791  Steps count? : 100, Cost: 1697.0
For episode num 1792  Steps count? : 100, Cost: 1697.0
For episode num 1793  Steps count? : 87, Cost: 1698.0
For episode num 1794  Steps count? : 100, Cost: 1698.0
For episode num 1795  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 54.0 steps.
Processing rollout for epoch: 39... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.03458789363503456 Actual: 0.03461063280701637
INFO: violated KL constraint 0.010166507214307785 at step 1.
Expected Improvement: 0.03458789363503456 Actual: 0.027738604694604874
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -3.0861315727233887     │
│ Metrics/EpCost                │ 0.30000001192092896     │
│ Metrics/EpLen                 │ 94.26000213623047       │
│ Train/Epoch                   │ 39.0                    │
│ Train/Entropy                 │ 0.6743486523628235      │
│ Train/KL                      │ 0.00018547565559856594  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9986181259155273      │
│ Train/PolicyRatio/Min         │ 0.9986181259155273      │
│ Train/PolicyRatio/Max         │ 0.9986181259155273      │
│ Train/PolicyRatio/Std         │ 0.000826397561468184    │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.474955677986145       │
│ TotalEnvSteps                 │ 80000.0                 │
│ Loss/Loss_pi                  │ -0.023595860227942467   │
│ Loss/Loss_pi/Delta            │ 0.0011817123740911484   │
│ Value/Adv                     │ -1.0251999249533128e-08 │
│ Loss/Loss_reward_critic       │ 0.09764406085014343     │
│ Loss/Loss_reward_critic/Delta │ 0.0004934519529342651   │
│ Value/reward                  │ -3.2380571365356445     │
│ Loss/Loss_cost_critic         │ 0.0066501684486866      │
│ Loss/Loss_cost_critic/Delta   │ -0.0007776049897074699  │
│ Value/cost                    │ 0.46874842047691345     │
│ Time/Total                    │ 108.4139633178711       │
│ Time/Rollout                  │ 2.180999517440796       │
│ Time/Update                   │ 0.8686513900756836      │
│ Time/Epoch                    │ 3.0496699810028076      │
│ Time/FPS                      │ 655.8088989257812       │
│ Misc/Alpha                    │ 0.5781553387641907      │
│ Misc/FinalStepNorm            │ 0.132080540060997       │
│ Misc/gradient_norm            │ 2.1048123836517334      │
│ Misc/xHx                      │ 0.05983300507068634     │
│ Misc/H_inv_g                  │ 0.2855645716190338      │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 0.047152936458587646    │
│ Misc/A                        │ 0.05753188207745552     │
│ Misc/B                        │ -4970815.5              │
│ Misc/q                        │ 0.05983300507068634     │
│ Misc/r                        │ -0.0005314386799000204  │
│ Misc/s                        │ 0.00012272439198568463  │
│ Misc/Lambda_star              │ 1.729638934135437       │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 1796  Steps count? : 54, Cost: 1698.0
For episode num 1797  Steps count? : 100, Cost: 1698.0
For episode num 1798  Steps count? : 100, Cost: 1698.0
For episode num 1799  Steps count? : 100, Cost: 1698.0
For episode num 1800  Steps count? : 100, Cost: 1698.0
For episode num 1801  Steps count? : 100, Cost: 1698.0
For episode num 1802  Steps count? : 100, Cost: 1698.0
For episode num 1803  Steps count? : 100, Cost: 1698.0
For episode num 1804  Steps count? : 100, Cost: 1698.0
For episode num 1805  Steps count? : 100, Cost: 1698.0
For episode num 1806  Steps count? : 100, Cost: 1698.0
For episode num 1807  Steps count? : 100, Cost: 1698.0
For episode num 1808  Steps count? : 100, Cost: 1698.0
For episode num 1809  Steps count? : 100, Cost: 1698.0
For episode num 1810  Steps count? : 100, Cost: 1698.0
For episode num 1811  Steps count? : 100, Cost: 1698.0
For episode num 1812  Steps count? : 100, Cost: 1698.0
For episode num 1813  Steps count? : 100, Cost: 1698.0
For episode num 1814  Steps count? : 100, Cost: 1698.0
For episode num 1815  Steps count? : 100, Cost: 1698.0
For episode num 1816  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 40... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.03750821575522423 Actual: 0.03671048581600189
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -2.0583786964416504     │
│ Metrics/EpCost                │ 0.14000000059604645     │
│ Metrics/EpLen                 │ 97.5                    │
│ Train/Epoch                   │ 40.0                    │
│ Train/Entropy                 │ 0.6611997485160828      │
│ Train/KL                      │ 0.0002590820658951998   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0010193586349487      │
│ Train/PolicyRatio/Min         │ 1.0010193586349487      │
│ Train/PolicyRatio/Max         │ 1.0010193586349487      │
│ Train/PolicyRatio/Std         │ 0.0007207673043012619   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.46873462200164795     │
│ TotalEnvSteps                 │ 82000.0                 │
│ Loss/Loss_pi                  │ -0.02756086364388466    │
│ Loss/Loss_pi/Delta            │ -0.003965003415942192   │
│ Value/Adv                     │ 3.814697180359872e-08   │
│ Loss/Loss_reward_critic       │ 0.07829240709543228     │
│ Loss/Loss_reward_critic/Delta │ -0.01935165375471115    │
│ Value/reward                  │ -2.9193997383117676     │
│ Loss/Loss_cost_critic         │ 0.003059888258576393    │
│ Loss/Loss_cost_critic/Delta   │ -0.0035902801901102066  │
│ Value/cost                    │ 0.3958170413970947      │
│ Time/Total                    │ 110.86260223388672      │
│ Time/Rollout                  │ 1.5885751247406006      │
│ Time/Update                   │ 0.8420989513397217      │
│ Time/Epoch                    │ 2.430690288543701       │
│ Time/FPS                      │ 822.8119506835938       │
│ Misc/Alpha                    │ 0.5337214469909668      │
│ Misc/FinalStepNorm            │ 0.13478019833564758     │
│ Misc/gradient_norm            │ 2.539325475692749       │
│ Misc/xHx                      │ 0.07021026313304901     │
│ Misc/H_inv_g                  │ 0.252529114484787       │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.011624165810644627    │
│ Misc/A                        │ 0.030268598347902298    │
│ Misc/B                        │ -584120192.0            │
│ Misc/q                        │ 0.07021026313304901     │
│ Misc/r                        │ -0.00020557158859446645 │
│ Misc/s                        │ 1.04803496014938e-06    │
│ Misc/Lambda_star              │ 1.8736364841461182      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 1817  Steps count? : 0, Cost: 1698.0
For episode num 1818  Steps count? : 100, Cost: 1698.0
For episode num 1819  Steps count? : 100, Cost: 1698.0
For episode num 1820  Steps count? : 100, Cost: 1698.0
For episode num 1821  Steps count? : 100, Cost: 1698.0
For episode num 1822  Steps count? : 100, Cost: 1698.0
For episode num 1823  Steps count? : 100, Cost: 1698.0
For episode num 1824  Steps count? : 100, Cost: 1698.0
For episode num 1825  Steps count? : 100, Cost: 1698.0
For episode num 1826  Steps count? : 100, Cost: 1698.0
For episode num 1827  Steps count? : 100, Cost: 1698.0
For episode num 1828  Steps count? : 100, Cost: 1698.0
For episode num 1829  Steps count? : 100, Cost: 1698.0
For episode num 1830  Steps count? : 100, Cost: 1698.0
For episode num 1831  Steps count? : 100, Cost: 1698.0
For episode num 1832  Steps count? : 100, Cost: 1698.0
For episode num 1833  Steps count? : 100, Cost: 1698.0
For episode num 1834  Steps count? : 100, Cost: 1698.0
For episode num 1835  Steps count? : 100, Cost: 1698.0
For episode num 1836  Steps count? : 100, Cost: 1698.0
For episode num 1837  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 41... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.040325406938791275 Actual: 0.028526421636343002
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -1.285264253616333     │
│ Metrics/EpCost                │ 0.05999999865889549    │
│ Metrics/EpLen                 │ 98.91999816894531      │
│ Train/Epoch                   │ 41.0                   │
│ Train/Entropy                 │ 0.6397084593772888     │
│ Train/KL                      │ 0.00016620555834379047 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 0.9997779726982117     │
│ Train/PolicyRatio/Min         │ 0.9997779726982117     │
│ Train/PolicyRatio/Max         │ 0.9997779726982117     │
│ Train/PolicyRatio/Std         │ 0.0001570391614222899  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.45879220962524414    │
│ TotalEnvSteps                 │ 84000.0                │
│ Loss/Loss_pi                  │ -0.021406123414635658  │
│ Loss/Loss_pi/Delta            │ 0.0061547402292490005  │
│ Value/Adv                     │ -4.267692688131319e-08 │
│ Loss/Loss_reward_critic       │ 0.06011460721492767    │
│ Loss/Loss_reward_critic/Delta │ -0.018177799880504608  │
│ Value/reward                  │ -2.490971326828003     │
│ Loss/Loss_cost_critic         │ 0.0022259755060076714  │
│ Loss/Loss_cost_critic/Delta   │ -0.0008339127525687218 │
│ Value/cost                    │ 0.3396347761154175     │
│ Time/Total                    │ 113.32795715332031     │
│ Time/Rollout                  │ 1.5959126949310303     │
│ Time/Update                   │ 0.8512423038482666     │
│ Time/Epoch                    │ 2.447171211242676      │
│ Time/FPS                      │ 817.2703857421875      │
│ Misc/Alpha                    │ 0.4959969222545624     │
│ Misc/FinalStepNorm            │ 0.14412306249141693    │
│ Misc/gradient_norm            │ 1.9381372928619385     │
│ Misc/xHx                      │ 0.08129652589559555    │
│ Misc/H_inv_g                  │ 0.29057249426841736    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.002663783263415098   │
│ Misc/A                        │ 0.054727524518966675   │
│ Misc/B                        │ -4100939776.0          │
│ Misc/q                        │ 0.08129652589559555    │
│ Misc/r                        │ -6.348080205498263e-05 │
│ Misc/s                        │ 1.4167343920234998e-07 │
│ Misc/Lambda_star              │ 2.016141653060913      │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 1838  Steps count? : 0, Cost: 1698.0
For episode num 1839  Steps count? : 100, Cost: 1698.0
For episode num 1840  Steps count? : 100, Cost: 1698.0
For episode num 1841  Steps count? : 100, Cost: 1698.0
For episode num 1842  Steps count? : 100, Cost: 1698.0
For episode num 1843  Steps count? : 100, Cost: 1698.0
For episode num 1844  Steps count? : 100, Cost: 1698.0
For episode num 1845  Steps count? : 100, Cost: 1698.0
For episode num 1846  Steps count? : 100, Cost: 1698.0
For episode num 1847  Steps count? : 100, Cost: 1698.0
For episode num 1848  Steps count? : 100, Cost: 1698.0
For episode num 1849  Steps count? : 100, Cost: 1698.0
For episode num 1850  Steps count? : 100, Cost: 1698.0
For episode num 1851  Steps count? : 100, Cost: 1698.0
For episode num 1852  Steps count? : 100, Cost: 1698.0
For episode num 1853  Steps count? : 100, Cost: 1698.0
For episode num 1854  Steps count? : 100, Cost: 1698.0
For episode num 1855  Steps count? : 100, Cost: 1698.0
For episode num 1856  Steps count? : 100, Cost: 1698.0
For episode num 1857  Steps count? : 100, Cost: 1698.0
For episode num 1858  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 42... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.03296351432800293 Actual: 0.028607377782464027
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.6691823601722717    │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 42.0                   │
│ Train/Entropy                 │ 0.6237069368362427     │
│ Train/KL                      │ 0.00020235725969541818 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 0.9984390139579773     │
│ Train/PolicyRatio/Min         │ 0.9984390139579773     │
│ Train/PolicyRatio/Max         │ 0.9984390139579773     │
│ Train/PolicyRatio/Std         │ 0.0011037979274988174  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.4514831006526947     │
│ TotalEnvSteps                 │ 86000.0                │
│ Loss/Loss_pi                  │ -0.021458180621266365  │
│ Loss/Loss_pi/Delta            │ -5.205720663070679e-05 │
│ Value/Adv                     │ 1.9264221862158593e-07 │
│ Loss/Loss_reward_critic       │ 0.04869961738586426    │
│ Loss/Loss_reward_critic/Delta │ -0.011414989829063416  │
│ Value/reward                  │ -2.247037649154663     │
│ Loss/Loss_cost_critic         │ 0.0016494112787768245  │
│ Loss/Loss_cost_critic/Delta   │ -0.0005765642272308469 │
│ Value/cost                    │ 0.2907949686050415     │
│ Time/Total                    │ 115.8397216796875      │
│ Time/Rollout                  │ 1.5787832736968994     │
│ Time/Update                   │ 0.9153211116790771     │
│ Time/Epoch                    │ 2.4941232204437256     │
│ Time/FPS                      │ 801.88525390625        │
│ Misc/Alpha                    │ 0.6069369316101074     │
│ Misc/FinalStepNorm            │ 0.15271393954753876    │
│ Misc/gradient_norm            │ 2.413893222808838      │
│ Misc/xHx                      │ 0.054292865097522736   │
│ Misc/H_inv_g                  │ 0.25161418318748474    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.00503790657967329    │
│ Misc/A                        │ 0.039011962711811066   │
│ Misc/B                        │ -4049718016.0          │
│ Misc/q                        │ 0.054292865097522736   │
│ Misc/r                        │ 4.85626223962754e-05   │
│ Misc/s                        │ 1.4433173589623038e-07 │
│ Misc/Lambda_star              │ 1.6476176977157593     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 1859  Steps count? : 0, Cost: 1698.0
For episode num 1860  Steps count? : 100, Cost: 1698.0
For episode num 1861  Steps count? : 100, Cost: 1698.0
For episode num 1862  Steps count? : 100, Cost: 1698.0
For episode num 1863  Steps count? : 100, Cost: 1698.0
For episode num 1864  Steps count? : 100, Cost: 1698.0
For episode num 1865  Steps count? : 100, Cost: 1698.0
For episode num 1866  Steps count? : 100, Cost: 1698.0
For episode num 1867  Steps count? : 100, Cost: 1698.0
For episode num 1868  Steps count? : 100, Cost: 1698.0
For episode num 1869  Steps count? : 100, Cost: 1698.0
For episode num 1870  Steps count? : 100, Cost: 1698.0
For episode num 1871  Steps count? : 100, Cost: 1698.0
For episode num 1872  Steps count? : 100, Cost: 1698.0
For episode num 1873  Steps count? : 100, Cost: 1698.0
For episode num 1874  Steps count? : 100, Cost: 1698.0
For episode num 1875  Steps count? : 100, Cost: 1698.0
For episode num 1876  Steps count? : 100, Cost: 1698.0
For episode num 1877  Steps count? : 100, Cost: 1698.0
For episode num 1878  Steps count? : 100, Cost: 1698.0
For episode num 1879  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 43... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.02154066041111946 Actual: 0.020875046029686928
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.49077364802360535   │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 43.0                   │
│ Train/Entropy                 │ 0.6165951490402222     │
│ Train/KL                      │ 0.00022484392684418708 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 0.999683141708374      │
│ Train/PolicyRatio/Min         │ 0.999683141708374      │
│ Train/PolicyRatio/Max         │ 0.999683141708374      │
│ Train/PolicyRatio/Std         │ 0.0002240526519017294  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.4482785165309906     │
│ TotalEnvSteps                 │ 88000.0                │
│ Loss/Loss_pi                  │ -0.015641331672668457  │
│ Loss/Loss_pi/Delta            │ 0.005816848948597908   │
│ Value/Adv                     │ 1.7166138377433526e-07 │
│ Loss/Loss_reward_critic       │ 0.038279589265584946   │
│ Loss/Loss_reward_critic/Delta │ -0.010420028120279312  │
│ Value/reward                  │ -1.979955792427063     │
│ Loss/Loss_cost_critic         │ 0.0012389733456075191  │
│ Loss/Loss_cost_critic/Delta   │ -0.0004104379331693053 │
│ Value/cost                    │ 0.24930179119110107    │
│ Time/Total                    │ 119.75081634521484     │
│ Time/Rollout                  │ 2.6411375999450684     │
│ Time/Update                   │ 1.2471892833709717     │
│ Time/Epoch                    │ 3.888368606567383      │
│ Time/FPS                      │ 514.354736328125       │
│ Misc/Alpha                    │ 0.9288808107376099     │
│ Misc/FinalStepNorm            │ 0.1404884308576584     │
│ Misc/gradient_norm            │ 1.9050633907318115     │
│ Misc/xHx                      │ 0.023179806768894196   │
│ Misc/H_inv_g                  │ 0.1512448489665985     │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.00699826143682003    │
│ Misc/A                        │ 0.008558609522879124   │
│ Misc/B                        │ -2496442112.0          │
│ Misc/q                        │ 0.023179806768894196   │
│ Misc/r                        │ 6.050213778507896e-05  │
│ Misc/s                        │ 2.4035628598539915e-07 │
│ Misc/Lambda_star              │ 1.0765644311904907     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 1880  Steps count? : 0, Cost: 1698.0
For episode num 1881  Steps count? : 100, Cost: 1698.0
For episode num 1882  Steps count? : 100, Cost: 1698.0
For episode num 1883  Steps count? : 100, Cost: 1698.0
For episode num 1884  Steps count? : 100, Cost: 1698.0
For episode num 1885  Steps count? : 100, Cost: 1698.0
For episode num 1886  Steps count? : 100, Cost: 1698.0
For episode num 1887  Steps count? : 100, Cost: 1698.0
For episode num 1888  Steps count? : 100, Cost: 1698.0
For episode num 1889  Steps count? : 100, Cost: 1698.0
For episode num 1890  Steps count? : 100, Cost: 1698.0
For episode num 1891  Steps count? : 100, Cost: 1698.0
For episode num 1892  Steps count? : 100, Cost: 1698.0
For episode num 1893  Steps count? : 100, Cost: 1698.0
For episode num 1894  Steps count? : 100, Cost: 1698.0
For episode num 1895  Steps count? : 100, Cost: 1698.0
For episode num 1896  Steps count? : 100, Cost: 1698.0
For episode num 1897  Steps count? : 100, Cost: 1698.0
For episode num 1898  Steps count? : 100, Cost: 1698.0
For episode num 1899  Steps count? : 100, Cost: 1698.0
For episode num 1900  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 44... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.013109462335705757 Actual: 0.012636289931833744
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.3850931227207184    │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 44.0                   │
│ Train/Entropy                 │ 0.5910918116569519     │
│ Train/KL                      │ 0.00023375032469630241 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.003400444984436      │
│ Train/PolicyRatio/Min         │ 1.003400444984436      │
│ Train/PolicyRatio/Max         │ 1.003400444984436      │
│ Train/PolicyRatio/Std         │ 0.0024045619647949934  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.4370514452457428     │
│ TotalEnvSteps                 │ 90000.0                │
│ Loss/Loss_pi                  │ -0.00947984866797924   │
│ Loss/Loss_pi/Delta            │ 0.006161483004689217   │
│ Value/Adv                     │ -6.866454782539222e-08 │
│ Loss/Loss_reward_critic       │ 0.0339818112552166     │
│ Loss/Loss_reward_critic/Delta │ -0.004297778010368347  │
│ Value/reward                  │ -1.7531955242156982    │
│ Loss/Loss_cost_critic         │ 0.000930515700019896   │
│ Loss/Loss_cost_critic/Delta   │ -0.0003084576455876231 │
│ Value/cost                    │ 0.21415260434150696    │
│ Time/Total                    │ 122.22715759277344     │
│ Time/Rollout                  │ 1.5933587551116943     │
│ Time/Update                   │ 0.8661415576934814     │
│ Time/Epoch                    │ 2.4595260620117188     │
│ Time/FPS                      │ 813.1650390625         │
│ Misc/Alpha                    │ 1.5269412994384766     │
│ Misc/FinalStepNorm            │ 0.22379687428474426    │
│ Misc/gradient_norm            │ 0.4454790949821472     │
│ Misc/xHx                      │ 0.008577975444495678   │
│ Misc/H_inv_g                  │ 0.14656546711921692    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.004039255436509848   │
│ Misc/A                        │ 0.008093312382698059   │
│ Misc/B                        │ -7965268992.0          │
│ Misc/q                        │ 0.008577975444495678   │
│ Misc/r                        │ 6.1667983572988305e-06 │
│ Misc/s                        │ 6.846565270279825e-08  │
│ Misc/Lambda_star              │ 0.6549040079116821     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 1901  Steps count? : 0, Cost: 1698.0
For episode num 1902  Steps count? : 100, Cost: 1698.0
For episode num 1903  Steps count? : 100, Cost: 1698.0
For episode num 1904  Steps count? : 100, Cost: 1698.0
For episode num 1905  Steps count? : 100, Cost: 1698.0
For episode num 1906  Steps count? : 100, Cost: 1698.0
For episode num 1907  Steps count? : 100, Cost: 1698.0
For episode num 1908  Steps count? : 100, Cost: 1698.0
For episode num 1909  Steps count? : 100, Cost: 1698.0
For episode num 1910  Steps count? : 100, Cost: 1698.0
For episode num 1911  Steps count? : 100, Cost: 1698.0
For episode num 1912  Steps count? : 100, Cost: 1698.0
For episode num 1913  Steps count? : 100, Cost: 1698.0
For episode num 1914  Steps count? : 100, Cost: 1698.0
For episode num 1915  Steps count? : 100, Cost: 1698.0
For episode num 1916  Steps count? : 100, Cost: 1698.0
For episode num 1917  Steps count? : 100, Cost: 1698.0
For episode num 1918  Steps count? : 100, Cost: 1698.0
For episode num 1919  Steps count? : 100, Cost: 1698.0
For episode num 1920  Steps count? : 100, Cost: 1698.0
For episode num 1921  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 45... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.008798009715974331 Actual: 0.008541135117411613
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.30053913593292236    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 45.0                    │
│ Train/Entropy                 │ 0.5546231865882874      │
│ Train/KL                      │ 0.00024128530640155077  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0019903182983398      │
│ Train/PolicyRatio/Min         │ 1.0019903182983398      │
│ Train/PolicyRatio/Max         │ 1.0019903182983398      │
│ Train/PolicyRatio/Std         │ 0.0014074237551540136   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.4214037358760834      │
│ TotalEnvSteps                 │ 92000.0                 │
│ Loss/Loss_pi                  │ -0.006406045984476805   │
│ Loss/Loss_pi/Delta            │ 0.0030738026835024357   │
│ Value/Adv                     │ 8.583068478174027e-09   │
│ Loss/Loss_reward_critic       │ 0.029683202505111694    │
│ Loss/Loss_reward_critic/Delta │ -0.004298608750104904   │
│ Value/reward                  │ -1.5479615926742554     │
│ Loss/Loss_cost_critic         │ 0.0007010286790318787   │
│ Loss/Loss_cost_critic/Delta   │ -0.00022948702098801732 │
│ Value/cost                    │ 0.1837049275636673      │
│ Time/Total                    │ 125.57268524169922      │
│ Time/Rollout                  │ 2.103285551071167       │
│ Time/Update                   │ 1.2235000133514404      │
│ Time/Epoch                    │ 3.3268094062805176      │
│ Time/FPS                      │ 601.1767578125          │
│ Misc/Alpha                    │ 2.2819364070892334      │
│ Misc/FinalStepNorm            │ 0.19832806289196014     │
│ Misc/gradient_norm            │ 0.7243677973747253      │
│ Misc/xHx                      │ 0.00384080084040761     │
│ Misc/H_inv_g                  │ 0.08691217005252838     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.001481739105656743    │
│ Misc/A                        │ 0.0033051972277462482   │
│ Misc/B                        │ -24442066944.0          │
│ Misc/q                        │ 0.00384080084040761     │
│ Misc/r                        │ -3.7007757782703266e-06 │
│ Misc/s                        │ 1.5570666533903932e-08  │
│ Misc/Lambda_star              │ 0.43822431564331055     │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 1922  Steps count? : 0, Cost: 1698.0
For episode num 1923  Steps count? : 100, Cost: 1698.0
For episode num 1924  Steps count? : 100, Cost: 1698.0
For episode num 1925  Steps count? : 100, Cost: 1698.0
For episode num 1926  Steps count? : 100, Cost: 1698.0
For episode num 1927  Steps count? : 100, Cost: 1698.0
For episode num 1928  Steps count? : 100, Cost: 1698.0
For episode num 1929  Steps count? : 100, Cost: 1698.0
For episode num 1930  Steps count? : 100, Cost: 1698.0
For episode num 1931  Steps count? : 100, Cost: 1698.0
For episode num 1932  Steps count? : 100, Cost: 1698.0
For episode num 1933  Steps count? : 100, Cost: 1698.0
For episode num 1934  Steps count? : 100, Cost: 1698.0
For episode num 1935  Steps count? : 100, Cost: 1698.0
For episode num 1936  Steps count? : 100, Cost: 1698.0
For episode num 1937  Steps count? : 100, Cost: 1698.0
For episode num 1938  Steps count? : 100, Cost: 1698.0
For episode num 1939  Steps count? : 100, Cost: 1698.0
For episode num 1940  Steps count? : 100, Cost: 1698.0
For episode num 1941  Steps count? : 100, Cost: 1698.0
For episode num 1942  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 46... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.006416358985006809 Actual: 0.006820641923695803
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.242884561419487      │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 46.0                    │
│ Train/Entropy                 │ 0.5651971101760864      │
│ Train/KL                      │ 0.00018985976930707693  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0018434524536133      │
│ Train/PolicyRatio/Min         │ 1.0018434524536133      │
│ Train/PolicyRatio/Max         │ 1.0018434524536133      │
│ Train/PolicyRatio/Std         │ 0.0013034614967182279   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.4258742034435272      │
│ TotalEnvSteps                 │ 94000.0                 │
│ Loss/Loss_pi                  │ -0.005114603787660599   │
│ Loss/Loss_pi/Delta            │ 0.001291442196816206    │
│ Value/Adv                     │ 2.4127959363795526e-07  │
│ Loss/Loss_reward_critic       │ 0.026556910946965218    │
│ Loss/Loss_reward_critic/Delta │ -0.0031262915581464767  │
│ Value/reward                  │ -1.3641561269760132     │
│ Loss/Loss_cost_critic         │ 0.0005295377923175693   │
│ Loss/Loss_cost_critic/Delta   │ -0.00017149088671430945 │
│ Value/cost                    │ 0.15766969323158264     │
│ Time/Total                    │ 129.32177734375         │
│ Time/Rollout                  │ 2.5703136920928955      │
│ Time/Update                   │ 1.1565883159637451      │
│ Time/Epoch                    │ 3.7269251346588135      │
│ Time/FPS                      │ 536.6356811523438       │
│ Misc/Alpha                    │ 3.1254162788391113      │
│ Misc/FinalStepNorm            │ 0.22905747592449188     │
│ Misc/gradient_norm            │ 0.30118611454963684     │
│ Misc/xHx                      │ 0.0020474442280828953   │
│ Misc/H_inv_g                  │ 0.07328862696886063     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.0014705423964187503   │
│ Misc/A                        │ 0.0020011253654956818   │
│ Misc/B                        │ -25687470080.0          │
│ Misc/q                        │ 0.0020474442280828953   │
│ Misc/r                        │ 1.061594844031788e-06   │
│ Misc/s                        │ 1.4330928443939683e-08  │
│ Misc/Lambda_star              │ 0.3199573755264282      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 1943  Steps count? : 0, Cost: 1698.0
For episode num 1944  Steps count? : 100, Cost: 1698.0
For episode num 1945  Steps count? : 100, Cost: 1698.0
For episode num 1946  Steps count? : 100, Cost: 1698.0
For episode num 1947  Steps count? : 100, Cost: 1698.0
For episode num 1948  Steps count? : 100, Cost: 1698.0
For episode num 1949  Steps count? : 100, Cost: 1698.0
For episode num 1950  Steps count? : 100, Cost: 1698.0
For episode num 1951  Steps count? : 100, Cost: 1698.0
For episode num 1952  Steps count? : 100, Cost: 1698.0
For episode num 1953  Steps count? : 100, Cost: 1698.0
For episode num 1954  Steps count? : 100, Cost: 1698.0
For episode num 1955  Steps count? : 100, Cost: 1698.0
For episode num 1956  Steps count? : 100, Cost: 1698.0
For episode num 1957  Steps count? : 100, Cost: 1698.0
For episode num 1958  Steps count? : 100, Cost: 1698.0
For episode num 1959  Steps count? : 100, Cost: 1698.0
For episode num 1960  Steps count? : 100, Cost: 1698.0
For episode num 1961  Steps count? : 100, Cost: 1698.0
For episode num 1962  Steps count? : 100, Cost: 1698.0
For episode num 1963  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 47... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.010753425769507885 Actual: 0.011461520567536354
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.225246861577034      │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 47.0                    │
│ Train/Entropy                 │ 0.5730342268943787      │
│ Train/KL                      │ 0.00023024811525829136  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9989779591560364      │
│ Train/PolicyRatio/Min         │ 0.9989779591560364      │
│ Train/PolicyRatio/Max         │ 0.9989779591560364      │
│ Train/PolicyRatio/Std         │ 0.0007226779125630856   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.42917048931121826     │
│ TotalEnvSteps                 │ 96000.0                 │
│ Loss/Loss_pi                  │ -0.008595936931669712   │
│ Loss/Loss_pi/Delta            │ -0.0034813331440091133  │
│ Value/Adv                     │ 4.100799699813251e-08   │
│ Loss/Loss_reward_critic       │ 0.023981494829058647    │
│ Loss/Loss_reward_critic/Delta │ -0.0025754161179065704  │
│ Value/reward                  │ -1.2049270868301392     │
│ Loss/Loss_cost_critic         │ 0.0003999192558694631   │
│ Loss/Loss_cost_critic/Delta   │ -0.00012961853644810617 │
│ Value/cost                    │ 0.13508886098861694     │
│ Time/Total                    │ 133.14149475097656      │
│ Time/Rollout                  │ 2.504460096359253       │
│ Time/Update                   │ 1.29073166847229        │
│ Time/Epoch                    │ 3.7952144145965576      │
│ Time/FPS                      │ 526.9796752929688       │
│ Misc/Alpha                    │ 1.86391282081604        │
│ Misc/FinalStepNorm            │ 0.24316462874412537     │
│ Misc/gradient_norm            │ 0.7852346897125244      │
│ Misc/xHx                      │ 0.005756759084761143    │
│ Misc/H_inv_g                  │ 0.1304592341184616      │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.0004949078429490328   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.005756759084761143    │
│ Misc/r                        │ 1.4871718576614512e-06  │
│ Misc/s                        │ 3.5836416056866938e-09  │
│ Misc/Lambda_star              │ 0.5365057587623596      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 1964  Steps count? : 0, Cost: 1698.0
For episode num 1965  Steps count? : 100, Cost: 1698.0
For episode num 1966  Steps count? : 100, Cost: 1698.0
For episode num 1967  Steps count? : 100, Cost: 1698.0
For episode num 1968  Steps count? : 100, Cost: 1698.0
For episode num 1969  Steps count? : 100, Cost: 1698.0
For episode num 1970  Steps count? : 100, Cost: 1698.0
For episode num 1971  Steps count? : 100, Cost: 1698.0
For episode num 1972  Steps count? : 100, Cost: 1698.0
For episode num 1973  Steps count? : 100, Cost: 1698.0
For episode num 1974  Steps count? : 100, Cost: 1698.0
For episode num 1975  Steps count? : 100, Cost: 1698.0
For episode num 1976  Steps count? : 100, Cost: 1698.0
For episode num 1977  Steps count? : 100, Cost: 1698.0
For episode num 1978  Steps count? : 100, Cost: 1698.0
For episode num 1979  Steps count? : 100, Cost: 1698.0
For episode num 1980  Steps count? : 100, Cost: 1698.0
For episode num 1981  Steps count? : 100, Cost: 1698.0
For episode num 1982  Steps count? : 100, Cost: 1698.0
For episode num 1983  Steps count? : 100, Cost: 1698.0
For episode num 1984  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 48... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.008158763870596886 Actual: 0.008587896823883057
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.19225353002548218   │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 48.0                   │
│ Train/Entropy                 │ 0.5693001747131348     │
│ Train/KL                      │ 0.00020395526371430606 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.0046361684799194     │
│ Train/PolicyRatio/Min         │ 1.0046361684799194     │
│ Train/PolicyRatio/Max         │ 1.0046361684799194     │
│ Train/PolicyRatio/Std         │ 0.003278181655332446   │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.4275699853897095     │
│ TotalEnvSteps                 │ 98000.0                │
│ Loss/Loss_pi                  │ -0.006443450693041086  │
│ Loss/Loss_pi/Delta            │ 0.002152486238628626   │
│ Value/Adv                     │ -5.531310875994677e-08 │
│ Loss/Loss_reward_critic       │ 0.021553389728069305   │
│ Loss/Loss_reward_critic/Delta │ -0.0024281051009893417 │
│ Value/reward                  │ -1.0612962245941162    │
│ Loss/Loss_cost_critic         │ 0.0003016194677911699  │
│ Loss/Loss_cost_critic/Delta   │ -9.82997880782932e-05  │
│ Value/cost                    │ 0.11580587923526764    │
│ Time/Total                    │ 136.046875             │
│ Time/Rollout                  │ 2.0512518882751465     │
│ Time/Update                   │ 0.8321282863616943     │
│ Time/Epoch                    │ 2.8833961486816406     │
│ Time/FPS                      │ 693.626708984375       │
│ Misc/Alpha                    │ 2.466099739074707      │
│ Misc/FinalStepNorm            │ 0.2688817083835602     │
│ Misc/gradient_norm            │ 0.47098228335380554    │
│ Misc/xHx                      │ 0.003288572421297431   │
│ Misc/H_inv_g                  │ 0.10903117060661316    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.0020925707649439573  │
│ Misc/A                        │ 0.0030880472622811794  │
│ Misc/B                        │ -29730918400.0         │
│ Misc/q                        │ 0.003288572421297431   │
│ Misc/r                        │ 2.05314813683799e-06   │
│ Misc/s                        │ 1.1021888290940751e-08 │
│ Misc/Lambda_star              │ 0.40549859404563904    │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 3.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 1985  Steps count? : 0, Cost: 1698.0
For episode num 1986  Steps count? : 100, Cost: 1698.0
For episode num 1987  Steps count? : 100, Cost: 1698.0
For episode num 1988  Steps count? : 100, Cost: 1698.0
For episode num 1989  Steps count? : 100, Cost: 1698.0
For episode num 1990  Steps count? : 100, Cost: 1698.0
For episode num 1991  Steps count? : 100, Cost: 1698.0
For episode num 1992  Steps count? : 100, Cost: 1698.0
For episode num 1993  Steps count? : 100, Cost: 1698.0
For episode num 1994  Steps count? : 100, Cost: 1698.0
For episode num 1995  Steps count? : 100, Cost: 1698.0
For episode num 1996  Steps count? : 100, Cost: 1698.0
For episode num 1997  Steps count? : 100, Cost: 1698.0
For episode num 1998  Steps count? : 100, Cost: 1698.0
For episode num 1999  Steps count? : 100, Cost: 1698.0
For episode num 2000  Steps count? : 100, Cost: 1698.0
For episode num 2001  Steps count? : 100, Cost: 1698.0
For episode num 2002  Steps count? : 100, Cost: 1698.0
For episode num 2003  Steps count? : 100, Cost: 1698.0
For episode num 2004  Steps count? : 100, Cost: 1698.0
For episode num 2005  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 49... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.021935854107141495 Actual: 0.022074708715081215
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.17859771847724915    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 49.0                    │
│ Train/Entropy                 │ 0.571942150592804       │
│ Train/KL                      │ 0.00027704183594323695  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9969814419746399      │
│ Train/PolicyRatio/Min         │ 0.9969814419746399      │
│ Train/PolicyRatio/Max         │ 0.9969814419746399      │
│ Train/PolicyRatio/Std         │ 0.0021344006527215242   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.42870208621025085     │
│ TotalEnvSteps                 │ 100000.0                │
│ Loss/Loss_pi                  │ -0.016565736383199692   │
│ Loss/Loss_pi/Delta            │ -0.010122285690158606   │
│ Value/Adv                     │ -1.4829635119895102e-07 │
│ Loss/Loss_reward_critic       │ 0.01932487078011036     │
│ Loss/Loss_reward_critic/Delta │ -0.0022285189479589462  │
│ Value/reward                  │ -0.9328985810279846     │
│ Loss/Loss_cost_critic         │ 0.0002273489662911743   │
│ Loss/Loss_cost_critic/Delta   │ -7.427050149999559e-05  │
│ Value/cost                    │ 0.09931672364473343     │
│ Time/Total                    │ 138.71084594726562      │
│ Time/Rollout                  │ 1.5397255420684814      │
│ Time/Update                   │ 1.1065077781677246      │
│ Time/Epoch                    │ 2.646254777908325       │
│ Time/FPS                      │ 755.7853393554688       │
│ Misc/Alpha                    │ 0.9124903678894043      │
│ Misc/FinalStepNorm            │ 0.09128550440073013     │
│ Misc/gradient_norm            │ 2.9338266849517822      │
│ Misc/xHx                      │ 0.024020012468099594    │
│ Misc/H_inv_g                  │ 0.10003995895385742     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.00436363136395812     │
│ Misc/A                        │ 0.010872923769056797    │
│ Misc/B                        │ -9898249216.0           │
│ Misc/q                        │ 0.024020012468099594    │
│ Misc/r                        │ -2.8812146410928108e-05 │
│ Misc/s                        │ 5.31424788619006e-08    │
│ Misc/Lambda_star              │ 1.0959019660949707      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2006  Steps count? : 0, Cost: 1698.0
For episode num 2007  Steps count? : 100, Cost: 1698.0
For episode num 2008  Steps count? : 100, Cost: 1698.0
For episode num 2009  Steps count? : 100, Cost: 1698.0
For episode num 2010  Steps count? : 100, Cost: 1698.0
For episode num 2011  Steps count? : 100, Cost: 1698.0
For episode num 2012  Steps count? : 100, Cost: 1698.0
For episode num 2013  Steps count? : 100, Cost: 1698.0
For episode num 2014  Steps count? : 100, Cost: 1698.0
For episode num 2015  Steps count? : 100, Cost: 1698.0
For episode num 2016  Steps count? : 100, Cost: 1698.0
For episode num 2017  Steps count? : 100, Cost: 1698.0
For episode num 2018  Steps count? : 100, Cost: 1698.0
For episode num 2019  Steps count? : 100, Cost: 1698.0
For episode num 2020  Steps count? : 100, Cost: 1698.0
For episode num 2021  Steps count? : 100, Cost: 1698.0
For episode num 2022  Steps count? : 100, Cost: 1698.0
For episode num 2023  Steps count? : 100, Cost: 1698.0
For episode num 2024  Steps count? : 100, Cost: 1698.0
For episode num 2025  Steps count? : 100, Cost: 1698.0
For episode num 2026  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 50... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.011110534891486168 Actual: 0.011105934157967567
INFO: violated KL constraint 0.010337715037167072 at step 1.
Expected Improvement: 0.011110534891486168 Actual: 0.008885023184120655
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.16791589558124542   │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 50.0                   │
│ Train/Entropy                 │ 0.5659626722335815     │
│ Train/KL                      │ 0.00019229413010179996 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.0053364038467407     │
│ Train/PolicyRatio/Min         │ 1.0053364038467407     │
│ Train/PolicyRatio/Max         │ 1.0053364038467407     │
│ Train/PolicyRatio/Std         │ 0.0031593425665050745  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.4261494278907776     │
│ TotalEnvSteps                 │ 102000.0               │
│ Loss/Loss_pi                  │ -0.007552096154540777  │
│ Loss/Loss_pi/Delta            │ 0.009013640228658915   │
│ Value/Adv                     │ -2.241134566816072e-08 │
│ Loss/Loss_reward_critic       │ 0.017456823959946632   │
│ Loss/Loss_reward_critic/Delta │ -0.0018680468201637268 │
│ Value/reward                  │ -0.8244968056678772    │
│ Loss/Loss_cost_critic         │ 0.0001705064787529409  │
│ Loss/Loss_cost_critic/Delta   │ -5.68424875382334e-05  │
│ Value/cost                    │ 0.08518372476100922    │
│ Time/Total                    │ 142.61366271972656     │
│ Time/Rollout                  │ 2.5864758491516113     │
│ Time/Update                   │ 1.292532205581665      │
│ Time/Epoch                    │ 3.879027843475342      │
│ Time/FPS                      │ 515.5932006835938      │
│ Misc/Alpha                    │ 1.8041807413101196     │
│ Misc/FinalStepNorm            │ 0.05778331309556961    │
│ Misc/gradient_norm            │ 1.709313988685608      │
│ Misc/xHx                      │ 0.006144254934042692   │
│ Misc/H_inv_g                  │ 0.04003431275486946    │
│ Misc/AcceptanceStep           │ 2.0                    │
│ Misc/cost_gradient_norm       │ 0.00018837933021131903 │
│ Misc/A                        │ 0.0                    │
│ Misc/B                        │ 0.0                    │
│ Misc/q                        │ 0.006144254934042692   │
│ Misc/r                        │ -6.172293751660618e-07 │
│ Misc/s                        │ 7.761488673985184e-11  │
│ Misc/Lambda_star              │ 0.5542681813240051     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 4.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 2027  Steps count? : 0, Cost: 1698.0
For episode num 2028  Steps count? : 100, Cost: 1698.0
For episode num 2029  Steps count? : 100, Cost: 1698.0
For episode num 2030  Steps count? : 100, Cost: 1698.0
For episode num 2031  Steps count? : 100, Cost: 1698.0
For episode num 2032  Steps count? : 100, Cost: 1698.0
For episode num 2033  Steps count? : 100, Cost: 1698.0
For episode num 2034  Steps count? : 100, Cost: 1698.0
For episode num 2035  Steps count? : 100, Cost: 1698.0
For episode num 2036  Steps count? : 100, Cost: 1698.0
For episode num 2037  Steps count? : 100, Cost: 1698.0
For episode num 2038  Steps count? : 100, Cost: 1698.0
For episode num 2039  Steps count? : 100, Cost: 1698.0
For episode num 2040  Steps count? : 100, Cost: 1698.0
For episode num 2041  Steps count? : 100, Cost: 1698.0
For episode num 2042  Steps count? : 100, Cost: 1698.0
For episode num 2043  Steps count? : 100, Cost: 1698.0
For episode num 2044  Steps count? : 100, Cost: 1698.0
For episode num 2045  Steps count? : 100, Cost: 1698.0
For episode num 2046  Steps count? : 100, Cost: 1698.0
For episode num 2047  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 51... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.01687154918909073 Actual: 0.016234349459409714
INFO: violated KL constraint 0.010565434582531452 at step 1.
Expected Improvement: 0.01687154918909073 Actual: 0.013080256059765816
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.17077481746673584    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 51.0                    │
│ Train/Entropy                 │ 0.5408746004104614      │
│ Train/KL                      │ 0.00019547012925613672  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9997320175170898      │
│ Train/PolicyRatio/Min         │ 0.9997320175170898      │
│ Train/PolicyRatio/Max         │ 0.9997320175170898      │
│ Train/PolicyRatio/Std         │ 0.0001549639127915725   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.41562631726264954     │
│ TotalEnvSteps                 │ 104000.0                │
│ Loss/Loss_pi                  │ -0.011096524074673653   │
│ Loss/Loss_pi/Delta            │ -0.0035444279201328754  │
│ Value/Adv                     │ -2.5177001816700795e-07 │
│ Loss/Loss_reward_critic       │ 0.015802793204784393    │
│ Loss/Loss_reward_critic/Delta │ -0.001654030755162239   │
│ Value/reward                  │ -0.7306316494941711     │
│ Loss/Loss_cost_critic         │ 0.00012781799887306988  │
│ Loss/Loss_cost_critic/Delta   │ -4.268847987987101e-05  │
│ Value/cost                    │ 0.07300812751054764     │
│ Time/Total                    │ 145.1182861328125       │
│ Time/Rollout                  │ 1.581312656402588       │
│ Time/Update                   │ 0.905480146408081       │
│ Time/Epoch                    │ 2.486819267272949       │
│ Time/FPS                      │ 804.240478515625        │
│ Misc/Alpha                    │ 1.1870225667953491      │
│ Misc/FinalStepNorm            │ 0.06271959841251373     │
│ Misc/gradient_norm            │ 2.4263527393341064      │
│ Misc/xHx                      │ 0.014194224961102009    │
│ Misc/H_inv_g                  │ 0.06604717671871185     │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 0.0007713608792982996   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.014194224961102009    │
│ Misc/r                        │ -4.699937107943697e-06  │
│ Misc/s                        │ 3.039531737414336e-09   │
│ Misc/Lambda_star              │ 0.8424439430236816      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2048  Steps count? : 0, Cost: 1698.0
For episode num 2049  Steps count? : 100, Cost: 1698.0
For episode num 2050  Steps count? : 100, Cost: 1698.0
For episode num 2051  Steps count? : 100, Cost: 1698.0
For episode num 2052  Steps count? : 100, Cost: 1698.0
For episode num 2053  Steps count? : 100, Cost: 1698.0
For episode num 2054  Steps count? : 100, Cost: 1698.0
For episode num 2055  Steps count? : 100, Cost: 1698.0
For episode num 2056  Steps count? : 100, Cost: 1698.0
For episode num 2057  Steps count? : 100, Cost: 1698.0
For episode num 2058  Steps count? : 100, Cost: 1698.0
For episode num 2059  Steps count? : 100, Cost: 1698.0
For episode num 2060  Steps count? : 100, Cost: 1698.0
For episode num 2061  Steps count? : 100, Cost: 1698.0
For episode num 2062  Steps count? : 100, Cost: 1698.0
For episode num 2063  Steps count? : 100, Cost: 1698.0
For episode num 2064  Steps count? : 100, Cost: 1698.0
For episode num 2065  Steps count? : 100, Cost: 1698.0
For episode num 2066  Steps count? : 100, Cost: 1698.0
For episode num 2067  Steps count? : 100, Cost: 1698.0
For episode num 2068  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 52... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.016674868762493134 Actual: 0.016274135559797287
INFO: violated KL constraint 0.010653781704604626 at step 1.
Expected Improvement: 0.016674868762493134 Actual: 0.013093827292323112
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.1662638783454895     │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 52.0                    │
│ Train/Entropy                 │ 0.5114904642105103      │
│ Train/KL                      │ 0.00019675683870445937  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0024617910385132      │
│ Train/PolicyRatio/Min         │ 1.0024617910385132      │
│ Train/PolicyRatio/Max         │ 1.0024617910385132      │
│ Train/PolicyRatio/Std         │ 0.0014542187564074993   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.40359359979629517     │
│ TotalEnvSteps                 │ 106000.0                │
│ Loss/Loss_pi                  │ -0.01111083384603262    │
│ Loss/Loss_pi/Delta            │ -1.4309771358966827e-05 │
│ Value/Adv                     │ -1.2969971407983394e-07 │
│ Loss/Loss_reward_critic       │ 0.014346321113407612    │
│ Loss/Loss_reward_critic/Delta │ -0.0014564720913767815  │
│ Value/reward                  │ -0.650111198425293      │
│ Loss/Loss_cost_critic         │ 9.576169395586476e-05   │
│ Loss/Loss_cost_critic/Delta   │ -3.2056304917205125e-05 │
│ Value/cost                    │ 0.06259306520223618     │
│ Time/Total                    │ 147.58203125            │
│ Time/Rollout                  │ 1.6025762557983398      │
│ Time/Update                   │ 0.8416523933410645      │
│ Time/Epoch                    │ 2.4442567825317383      │
│ Time/FPS                      │ 818.2450561523438       │
│ Misc/Alpha                    │ 1.2009278535842896      │
│ Misc/FinalStepNorm            │ 0.06335902959108353     │
│ Misc/gradient_norm            │ 2.5198118686676025      │
│ Misc/xHx                      │ 0.013867426663637161    │
│ Misc/H_inv_g                  │ 0.065947987139225       │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 0.001041274517774582    │
│ Misc/A                        │ 0.012417766265571117    │
│ Misc/B                        │ -47696449536.0          │
│ Misc/q                        │ 0.013867426663637161    │
│ Misc/r                        │ 4.358430487627629e-06   │
│ Misc/s                        │ 3.1037008518808307e-09  │
│ Misc/Lambda_star              │ 0.8326894640922546      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2069  Steps count? : 0, Cost: 1698.0
For episode num 2070  Steps count? : 100, Cost: 1698.0
For episode num 2071  Steps count? : 100, Cost: 1698.0
For episode num 2072  Steps count? : 100, Cost: 1698.0
For episode num 2073  Steps count? : 100, Cost: 1698.0
For episode num 2074  Steps count? : 100, Cost: 1698.0
For episode num 2075  Steps count? : 100, Cost: 1698.0
For episode num 2076  Steps count? : 100, Cost: 1698.0
For episode num 2077  Steps count? : 100, Cost: 1698.0
For episode num 2078  Steps count? : 100, Cost: 1698.0
For episode num 2079  Steps count? : 100, Cost: 1698.0
For episode num 2080  Steps count? : 100, Cost: 1698.0
For episode num 2081  Steps count? : 100, Cost: 1698.0
For episode num 2082  Steps count? : 100, Cost: 1698.0
For episode num 2083  Steps count? : 100, Cost: 1698.0
For episode num 2084  Steps count? : 100, Cost: 1698.0
For episode num 2085  Steps count? : 100, Cost: 1698.0
For episode num 2086  Steps count? : 100, Cost: 1698.0
For episode num 2087  Steps count? : 100, Cost: 1698.0
For episode num 2088  Steps count? : 100, Cost: 1698.0
For episode num 2089  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 53... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.021688146516680717 Actual: 0.02179093100130558
INFO: violated KL constraint 0.010225526988506317 at step 1.
Expected Improvement: 0.021688146516680717 Actual: 0.017419705167412758
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.17295099794864655    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 53.0                    │
│ Train/Entropy                 │ 0.5009749531745911      │
│ Train/KL                      │ 0.00019124332175124437  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.001558542251587       │
│ Train/PolicyRatio/Min         │ 1.001558542251587       │
│ Train/PolicyRatio/Max         │ 1.001558542251587       │
│ Train/PolicyRatio/Std         │ 0.0009252414456568658   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3993331491947174      │
│ TotalEnvSteps                 │ 108000.0                │
│ Loss/Loss_pi                  │ -0.014810720458626747   │
│ Loss/Loss_pi/Delta            │ -0.0036998866125941277  │
│ Value/Adv                     │ 6.914138594993346e-08   │
│ Loss/Loss_reward_critic       │ 0.013037736527621746    │
│ Loss/Loss_reward_critic/Delta │ -0.0013085845857858658  │
│ Value/reward                  │ -0.5811176300048828     │
│ Loss/Loss_cost_critic         │ 7.17413640813902e-05    │
│ Loss/Loss_cost_critic/Delta   │ -2.4020329874474555e-05 │
│ Value/cost                    │ 0.05365898832678795     │
│ Time/Total                    │ 150.1811981201172       │
│ Time/Rollout                  │ 1.5758647918701172      │
│ Time/Update                   │ 1.0051872730255127      │
│ Time/Epoch                    │ 2.5810706615448         │
│ Time/FPS                      │ 774.8724365234375       │
│ Misc/Alpha                    │ 0.9225106239318848      │
│ Misc/FinalStepNorm            │ 0.03562768176198006     │
│ Misc/gradient_norm            │ 3.7838852405548096      │
│ Misc/xHx                      │ 0.023501038551330566    │
│ Misc/H_inv_g                  │ 0.04827544093132019     │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 0.000725808902643621    │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.023501038551330566    │
│ Misc/r                        │ -4.3810127863253e-06    │
│ Misc/s                        │ 9.301128756078469e-10   │
│ Misc/Lambda_star              │ 1.0839983224868774      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2090  Steps count? : 0, Cost: 1698.0
For episode num 2091  Steps count? : 100, Cost: 1698.0
For episode num 2092  Steps count? : 100, Cost: 1698.0
For episode num 2093  Steps count? : 100, Cost: 1698.0
For episode num 2094  Steps count? : 100, Cost: 1698.0
For episode num 2095  Steps count? : 100, Cost: 1698.0
For episode num 2096  Steps count? : 100, Cost: 1698.0
For episode num 2097  Steps count? : 100, Cost: 1698.0
For episode num 2098  Steps count? : 100, Cost: 1698.0
For episode num 2099  Steps count? : 100, Cost: 1698.0
For episode num 2100  Steps count? : 100, Cost: 1698.0
For episode num 2101  Steps count? : 100, Cost: 1698.0
For episode num 2102  Steps count? : 100, Cost: 1698.0
For episode num 2103  Steps count? : 100, Cost: 1698.0
For episode num 2104  Steps count? : 100, Cost: 1698.0
For episode num 2105  Steps count? : 100, Cost: 1698.0
For episode num 2106  Steps count? : 100, Cost: 1698.0
For episode num 2107  Steps count? : 100, Cost: 1698.0
For episode num 2108  Steps count? : 100, Cost: 1698.0
For episode num 2109  Steps count? : 100, Cost: 1698.0
For episode num 2110  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 54... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.024915315210819244 Actual: 0.02520315907895565
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.1819470375776291     │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 54.0                    │
│ Train/Entropy                 │ 0.5014148354530334      │
│ Train/KL                      │ 0.0002913440694101155   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.996345043182373       │
│ Train/PolicyRatio/Min         │ 0.996345043182373       │
│ Train/PolicyRatio/Max         │ 0.996345043182373       │
│ Train/PolicyRatio/Std         │ 0.0025844729971140623   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.39950740337371826     │
│ TotalEnvSteps                 │ 110000.0                │
│ Loss/Loss_pi                  │ -0.018904272466897964   │
│ Loss/Loss_pi/Delta            │ -0.004093552008271217   │
│ Value/Adv                     │ 1.3828277189986693e-07  │
│ Loss/Loss_reward_critic       │ 0.011852307245135307    │
│ Loss/Loss_reward_critic/Delta │ -0.0011854292824864388  │
│ Value/reward                  │ -0.5179361701011658     │
│ Loss/Loss_cost_critic         │ 5.3753054089611396e-05  │
│ Loss/Loss_cost_critic/Delta   │ -1.7988309991778806e-05 │
│ Value/cost                    │ 0.04603264480829239     │
│ Time/Total                    │ 152.6519317626953       │
│ Time/Rollout                  │ 1.5774304866790771      │
│ Time/Update                   │ 0.8750958442687988      │
│ Time/Epoch                    │ 2.452554702758789       │
│ Time/FPS                      │ 815.4766845703125       │
│ Misc/Alpha                    │ 0.8028423190116882      │
│ Misc/FinalStepNorm            │ 0.0736096128821373      │
│ Misc/gradient_norm            │ 4.346601486206055       │
│ Misc/xHx                      │ 0.031029116362333298    │
│ Misc/H_inv_g                  │ 0.09168627113103867     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.0011665393831208348   │
│ Misc/A                        │ 0.025670746341347694    │
│ Misc/B                        │ -49540476928.0          │
│ Misc/q                        │ 0.031029116362333298    │
│ Misc/r                        │ -8.221976713684853e-06  │
│ Misc/s                        │ 2.6159463484276557e-09  │
│ Misc/Lambda_star              │ 1.2455745935440063      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2111  Steps count? : 0, Cost: 1698.0
For episode num 2112  Steps count? : 100, Cost: 1698.0
For episode num 2113  Steps count? : 100, Cost: 1698.0
For episode num 2114  Steps count? : 100, Cost: 1698.0
For episode num 2115  Steps count? : 100, Cost: 1698.0
For episode num 2116  Steps count? : 100, Cost: 1698.0
For episode num 2117  Steps count? : 100, Cost: 1698.0
For episode num 2118  Steps count? : 100, Cost: 1698.0
For episode num 2119  Steps count? : 100, Cost: 1698.0
For episode num 2120  Steps count? : 100, Cost: 1698.0
For episode num 2121  Steps count? : 100, Cost: 1698.0
For episode num 2122  Steps count? : 100, Cost: 1698.0
For episode num 2123  Steps count? : 100, Cost: 1698.0
For episode num 2124  Steps count? : 100, Cost: 1698.0
For episode num 2125  Steps count? : 100, Cost: 1698.0
For episode num 2126  Steps count? : 100, Cost: 1698.0
For episode num 2127  Steps count? : 100, Cost: 1698.0
For episode num 2128  Steps count? : 100, Cost: 1698.0
For episode num 2129  Steps count? : 100, Cost: 1698.0
For episode num 2130  Steps count? : 100, Cost: 1698.0
For episode num 2131  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 55... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.02530084177851677 Actual: 0.0254367645829916
INFO: violated KL constraint 0.010024534538388252 at step 1.
Expected Improvement: 0.02530084177851677 Actual: 0.02033628709614277
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.18862731754779816    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 55.0                    │
│ Train/Entropy                 │ 0.5037668347358704      │
│ Train/KL                      │ 0.00018835718219634145  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0010312795639038      │
│ Train/PolicyRatio/Min         │ 1.0010312795639038      │
│ Train/PolicyRatio/Max         │ 1.0010312795639038      │
│ Train/PolicyRatio/Std         │ 0.0006099470774643123   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.4004480242729187      │
│ TotalEnvSteps                 │ 112000.0                │
│ Loss/Loss_pi                  │ -0.017289917916059494   │
│ Loss/Loss_pi/Delta            │ 0.0016143545508384705   │
│ Value/Adv                     │ -1.4686584393075464e-07 │
│ Loss/Loss_reward_critic       │ 0.010818573646247387    │
│ Loss/Loss_reward_critic/Delta │ -0.0010337335988879204  │
│ Value/reward                  │ -0.47128838300704956    │
│ Loss/Loss_cost_critic         │ 4.033983714180067e-05   │
│ Loss/Loss_cost_critic/Delta   │ -1.3413216947810724e-05 │
│ Value/cost                    │ 0.03949044644832611     │
│ Time/Total                    │ 156.66943359375         │
│ Time/Rollout                  │ 2.575937271118164       │
│ Time/Update                   │ 1.4163846969604492      │
│ Time/Epoch                    │ 3.9923453330993652      │
│ Time/FPS                      │ 500.95880126953125      │
│ Misc/Alpha                    │ 0.7909386157989502      │
│ Misc/FinalStepNorm            │ 0.029105294495821       │
│ Misc/gradient_norm            │ 4.471412658691406       │
│ Misc/xHx                      │ 0.03197012096643448     │
│ Misc/H_inv_g                  │ 0.04599803313612938     │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 0.0007563723484054208   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.03197012096643448     │
│ Misc/r                        │ -5.612569111690391e-06  │
│ Misc/s                        │ 1.3269674248306274e-09  │
│ Misc/Lambda_star              │ 1.2643206119537354      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2132  Steps count? : 0, Cost: 1698.0
For episode num 2133  Steps count? : 100, Cost: 1698.0
For episode num 2134  Steps count? : 100, Cost: 1698.0
For episode num 2135  Steps count? : 100, Cost: 1698.0
For episode num 2136  Steps count? : 100, Cost: 1698.0
For episode num 2137  Steps count? : 100, Cost: 1698.0
For episode num 2138  Steps count? : 100, Cost: 1698.0
For episode num 2139  Steps count? : 100, Cost: 1698.0
For episode num 2140  Steps count? : 100, Cost: 1698.0
For episode num 2141  Steps count? : 100, Cost: 1698.0
For episode num 2142  Steps count? : 100, Cost: 1698.0
For episode num 2143  Steps count? : 100, Cost: 1698.0
For episode num 2144  Steps count? : 100, Cost: 1698.0
For episode num 2145  Steps count? : 100, Cost: 1698.0
For episode num 2146  Steps count? : 100, Cost: 1698.0
For episode num 2147  Steps count? : 100, Cost: 1698.0
For episode num 2148  Steps count? : 100, Cost: 1698.0
For episode num 2149  Steps count? : 100, Cost: 1698.0
For episode num 2150  Steps count? : 100, Cost: 1698.0
For episode num 2151  Steps count? : 100, Cost: 1698.0
For episode num 2152  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 56... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.026536233723163605 Actual: 0.026467708870768547
INFO: violated KL constraint 0.010458646342158318 at step 1.
Expected Improvement: 0.026536233723163605 Actual: 0.021187810227274895
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.19032755494117737   │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 56.0                   │
│ Train/Entropy                 │ 0.4938106834888458     │
│ Train/KL                      │ 0.0001946702104760334  │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 0.9986945390701294     │
│ Train/PolicyRatio/Min         │ 0.9986945390701294     │
│ Train/PolicyRatio/Max         │ 0.9986945390701294     │
│ Train/PolicyRatio/Std         │ 0.0007706119795329869  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.3964880704879761     │
│ TotalEnvSteps                 │ 114000.0               │
│ Loss/Loss_pi                  │ -0.018005963414907455  │
│ Loss/Loss_pi/Delta            │ -0.0007160454988479614 │
│ Value/Adv                     │ 1.3542175736347417e-07 │
│ Loss/Loss_reward_critic       │ 0.009900394827127457   │
│ Loss/Loss_reward_critic/Delta │ -0.0009181788191199303 │
│ Value/reward                  │ -0.4292495548725128    │
│ Loss/Loss_cost_critic         │ 3.0260396670200862e-05 │
│ Loss/Loss_cost_critic/Delta   │ -1.007944047159981e-05 │
│ Value/cost                    │ 0.033860381692647934   │
│ Time/Total                    │ 159.36798095703125     │
│ Time/Rollout                  │ 1.7624850273132324     │
│ Time/Update                   │ 0.9109063148498535     │
│ Time/Epoch                    │ 2.67341685295105       │
│ Time/FPS                      │ 748.1065673828125      │
│ Misc/Alpha                    │ 0.7548751831054688     │
│ Misc/FinalStepNorm            │ 0.038434140384197235   │
│ Misc/gradient_norm            │ 4.726289749145508      │
│ Misc/xHx                      │ 0.035097770392894745   │
│ Misc/H_inv_g                  │ 0.0636432021856308     │
│ Misc/AcceptanceStep           │ 2.0                    │
│ Misc/cost_gradient_norm       │ 0.0005989142809994519  │
│ Misc/A                        │ 0.0                    │
│ Misc/B                        │ 0.0                    │
│ Misc/q                        │ 0.035097770392894745   │
│ Misc/r                        │ 4.151994744461263e-06  │
│ Misc/s                        │ 5.600220842616466e-10  │
│ Misc/Lambda_star              │ 1.3247222900390625     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 4.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 2153  Steps count? : 0, Cost: 1698.0
For episode num 2154  Steps count? : 100, Cost: 1698.0
For episode num 2155  Steps count? : 100, Cost: 1698.0
For episode num 2156  Steps count? : 100, Cost: 1698.0
For episode num 2157  Steps count? : 100, Cost: 1698.0
For episode num 2158  Steps count? : 100, Cost: 1698.0
For episode num 2159  Steps count? : 100, Cost: 1698.0
For episode num 2160  Steps count? : 100, Cost: 1698.0
For episode num 2161  Steps count? : 100, Cost: 1698.0
For episode num 2162  Steps count? : 100, Cost: 1698.0
For episode num 2163  Steps count? : 100, Cost: 1698.0
For episode num 2164  Steps count? : 100, Cost: 1698.0
For episode num 2165  Steps count? : 100, Cost: 1698.0
For episode num 2166  Steps count? : 100, Cost: 1698.0
For episode num 2167  Steps count? : 100, Cost: 1698.0
For episode num 2168  Steps count? : 100, Cost: 1698.0
For episode num 2169  Steps count? : 100, Cost: 1698.0
For episode num 2170  Steps count? : 100, Cost: 1698.0
For episode num 2171  Steps count? : 100, Cost: 1698.0
For episode num 2172  Steps count? : 100, Cost: 1698.0
For episode num 2173  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 57... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.03073311224579811 Actual: 0.0299982950091362
INFO: violated KL constraint 0.01029776781797409 at step 1.
Expected Improvement: 0.03073311224579811 Actual: 0.024108223617076874
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.18959258496761322   │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 57.0                   │
│ Train/Entropy                 │ 0.4819503128528595     │
│ Train/KL                      │ 0.0001925372052937746  │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 0.9992851614952087     │
│ Train/PolicyRatio/Min         │ 0.9992851614952087     │
│ Train/PolicyRatio/Max         │ 0.9992851614952087     │
│ Train/PolicyRatio/Std         │ 0.00041878808406181633 │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.39181220531463623    │
│ TotalEnvSteps                 │ 116000.0               │
│ Loss/Loss_pi                  │ -0.020464789122343063  │
│ Loss/Loss_pi/Delta            │ -0.002458825707435608  │
│ Value/Adv                     │ 4.220008875677195e-08  │
│ Loss/Loss_reward_critic       │ 0.009047009982168674   │
│ Loss/Loss_reward_critic/Delta │ -0.0008533848449587822 │
│ Value/reward                  │ -0.39187565445899963   │
│ Loss/Loss_cost_critic         │ 2.2677711967844516e-05 │
│ Loss/Loss_cost_critic/Delta   │ -7.582684702356346e-06 │
│ Value/cost                    │ 0.029034603387117386   │
│ Time/Total                    │ 161.90350341796875     │
│ Time/Rollout                  │ 1.5905029773712158     │
│ Time/Update                   │ 0.9267816543579102     │
│ Time/Epoch                    │ 2.517301321029663      │
│ Time/FPS                      │ 794.501953125          │
│ Misc/Alpha                    │ 0.6513257622718811     │
│ Misc/FinalStepNorm            │ 0.02375144325196743    │
│ Misc/gradient_norm            │ 5.530384063720703      │
│ Misc/xHx                      │ 0.047144751995801926   │
│ Misc/H_inv_g                  │ 0.04558289051055908    │
│ Misc/AcceptanceStep           │ 2.0                    │
│ Misc/cost_gradient_norm       │ 6.082528852857649e-05  │
│ Misc/A                        │ 0.0                    │
│ Misc/B                        │ 0.0                    │
│ Misc/q                        │ 0.047144751995801926   │
│ Misc/r                        │ 7.940366231196094e-08  │
│ Misc/s                        │ 1.546869021762376e-10  │
│ Misc/Lambda_star              │ 1.535330057144165      │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 4.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 2174  Steps count? : 0, Cost: 1698.0
For episode num 2175  Steps count? : 100, Cost: 1698.0
For episode num 2176  Steps count? : 100, Cost: 1698.0
For episode num 2177  Steps count? : 100, Cost: 1698.0
For episode num 2178  Steps count? : 100, Cost: 1698.0
For episode num 2179  Steps count? : 100, Cost: 1698.0
For episode num 2180  Steps count? : 100, Cost: 1698.0
For episode num 2181  Steps count? : 100, Cost: 1698.0
For episode num 2182  Steps count? : 100, Cost: 1698.0
For episode num 2183  Steps count? : 100, Cost: 1698.0
For episode num 2184  Steps count? : 100, Cost: 1698.0
For episode num 2185  Steps count? : 100, Cost: 1698.0
For episode num 2186  Steps count? : 100, Cost: 1698.0
For episode num 2187  Steps count? : 100, Cost: 1698.0
For episode num 2188  Steps count? : 100, Cost: 1698.0
For episode num 2189  Steps count? : 100, Cost: 1698.0
For episode num 2190  Steps count? : 100, Cost: 1698.0
For episode num 2191  Steps count? : 100, Cost: 1698.0
For episode num 2192  Steps count? : 100, Cost: 1698.0
For episode num 2193  Steps count? : 100, Cost: 1698.0
For episode num 2194  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 58... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.034340985119342804 Actual: 0.03468960523605347
INFO: violated KL constraint 0.010170339606702328 at step 1.
Expected Improvement: 0.034340985119342804 Actual: 0.027695925906300545
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.19205009937286377    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 58.0                    │
│ Train/Entropy                 │ 0.4779006242752075      │
│ Train/KL                      │ 0.00019054170115850866  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.999866247177124       │
│ Train/PolicyRatio/Min         │ 0.999866247177124       │
│ Train/PolicyRatio/Max         │ 0.999866247177124       │
│ Train/PolicyRatio/Std         │ 7.758742867736146e-05   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.39022284746170044     │
│ TotalEnvSteps                 │ 118000.0                │
│ Loss/Loss_pi                  │ -0.023556318134069443   │
│ Loss/Loss_pi/Delta            │ -0.0030915290117263794  │
│ Value/Adv                     │ -1.8787383737617347e-07 │
│ Loss/Loss_reward_critic       │ 0.008285904303193092    │
│ Loss/Loss_reward_critic/Delta │ -0.0007611056789755821  │
│ Value/reward                  │ -0.3612835109233856     │
│ Loss/Loss_cost_critic         │ 1.695972605375573e-05   │
│ Loss/Loss_cost_critic/Delta   │ -5.717985914088786e-06  │
│ Value/cost                    │ 0.024888616055250168    │
│ Time/Total                    │ 164.39923095703125      │
│ Time/Rollout                  │ 1.5741362571716309      │
│ Time/Update                   │ 0.9032831192016602      │
│ Time/Epoch                    │ 2.47743558883667        │
│ Time/FPS                      │ 807.2866821289062       │
│ Misc/Alpha                    │ 0.5818831920623779      │
│ Misc/FinalStepNorm            │ 0.029778538271784782    │
│ Misc/gradient_norm            │ 6.404718399047852       │
│ Misc/xHx                      │ 0.05906881392002106     │
│ Misc/H_inv_g                  │ 0.06397019326686859     │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 0.0011183185270056129   │
│ Misc/A                        │ 0.05067876726388931     │
│ Misc/B                        │ -53234790400.0          │
│ Misc/q                        │ 0.05906881392002106     │
│ Misc/r                        │ -9.924860023602378e-06  │
│ Misc/s                        │ 1.7404426788658611e-09  │
│ Misc/Lambda_star              │ 1.7185579538345337      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 3.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2195  Steps count? : 0, Cost: 1698.0
For episode num 2196  Steps count? : 100, Cost: 1698.0
For episode num 2197  Steps count? : 100, Cost: 1698.0
For episode num 2198  Steps count? : 100, Cost: 1698.0
For episode num 2199  Steps count? : 100, Cost: 1698.0
For episode num 2200  Steps count? : 100, Cost: 1698.0
For episode num 2201  Steps count? : 100, Cost: 1698.0
For episode num 2202  Steps count? : 100, Cost: 1698.0
For episode num 2203  Steps count? : 100, Cost: 1698.0
For episode num 2204  Steps count? : 100, Cost: 1698.0
For episode num 2205  Steps count? : 100, Cost: 1698.0
For episode num 2206  Steps count? : 100, Cost: 1698.0
For episode num 2207  Steps count? : 100, Cost: 1698.0
For episode num 2208  Steps count? : 100, Cost: 1698.0
For episode num 2209  Steps count? : 100, Cost: 1698.0
For episode num 2210  Steps count? : 100, Cost: 1698.0
For episode num 2211  Steps count? : 100, Cost: 1698.0
For episode num 2212  Steps count? : 100, Cost: 1698.0
For episode num 2213  Steps count? : 100, Cost: 1698.0
For episode num 2214  Steps count? : 100, Cost: 1698.0
For episode num 2215  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 59... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.03244633600115776 Actual: 0.03233645111322403
INFO: violated KL constraint 0.010348058305680752 at step 1.
Expected Improvement: 0.03244633600115776 Actual: 0.02589174546301365
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.19010987877845764    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 59.0                    │
│ Train/Entropy                 │ 0.4690846800804138      │
│ Train/KL                      │ 0.0001931726437760517   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0037510395050049      │
│ Train/PolicyRatio/Min         │ 1.0037510395050049      │
│ Train/PolicyRatio/Max         │ 1.0037510395050049      │
│ Train/PolicyRatio/Std         │ 0.002212291117757559    │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3868023157119751      │
│ TotalEnvSteps                 │ 120000.0                │
│ Loss/Loss_pi                  │ -0.02200213260948658    │
│ Loss/Loss_pi/Delta            │ 0.0015541855245828629   │
│ Value/Adv                     │ -2.7656554379973386e-07 │
│ Loss/Loss_reward_critic       │ 0.007592848502099514    │
│ Loss/Loss_reward_critic/Delta │ -0.0006930558010935783  │
│ Value/reward                  │ -0.335999459028244      │
│ Loss/Loss_cost_critic         │ 1.2633876394829713e-05  │
│ Loss/Loss_cost_critic/Delta   │ -4.325849658926018e-06  │
│ Value/cost                    │ 0.021340394392609596    │
│ Time/Total                    │ 166.90040588378906      │
│ Time/Rollout                  │ 1.573887586593628       │
│ Time/Update                   │ 0.9093835353851318      │
│ Time/Epoch                    │ 2.483285903930664       │
│ Time/FPS                      │ 805.3848876953125       │
│ Misc/Alpha                    │ 0.6166423559188843      │
│ Misc/FinalStepNorm            │ 0.03079124353826046     │
│ Misc/gradient_norm            │ 6.0935492515563965      │
│ Misc/xHx                      │ 0.05259726569056511     │
│ Misc/H_inv_g                  │ 0.06241713836789131     │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 0.00012780647375620902  │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.05259726569056511     │
│ Misc/r                        │ -1.05398180494376e-06   │
│ Misc/s                        │ 2.5851933441178865e-11  │
│ Misc/Lambda_star              │ 1.62168550491333        │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2216  Steps count? : 0, Cost: 1698.0
For episode num 2217  Steps count? : 100, Cost: 1698.0
For episode num 2218  Steps count? : 100, Cost: 1698.0
For episode num 2219  Steps count? : 100, Cost: 1698.0
For episode num 2220  Steps count? : 100, Cost: 1698.0
For episode num 2221  Steps count? : 100, Cost: 1698.0
For episode num 2222  Steps count? : 100, Cost: 1698.0
For episode num 2223  Steps count? : 100, Cost: 1698.0
For episode num 2224  Steps count? : 100, Cost: 1698.0
For episode num 2225  Steps count? : 100, Cost: 1698.0
For episode num 2226  Steps count? : 100, Cost: 1698.0
For episode num 2227  Steps count? : 100, Cost: 1698.0
For episode num 2228  Steps count? : 100, Cost: 1698.0
For episode num 2229  Steps count? : 100, Cost: 1698.0
For episode num 2230  Steps count? : 100, Cost: 1698.0
For episode num 2231  Steps count? : 100, Cost: 1698.0
For episode num 2232  Steps count? : 100, Cost: 1698.0
For episode num 2233  Steps count? : 100, Cost: 1698.0
For episode num 2234  Steps count? : 100, Cost: 1698.0
For episode num 2235  Steps count? : 100, Cost: 1698.0
For episode num 2236  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 60... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.03758053854107857 Actual: 0.038079891353845596
INFO: violated KL constraint 0.010240106843411922 at step 1.
Expected Improvement: 0.03758053854107857 Actual: 0.030382351949810982
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.19314491748809814   │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 60.0                   │
│ Train/Entropy                 │ 0.4643126130104065     │
│ Train/KL                      │ 0.00019114110909868032 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 0.9991528391838074     │
│ Train/PolicyRatio/Min         │ 0.9991528391838074     │
│ Train/PolicyRatio/Max         │ 0.9991528391838074     │
│ Train/PolicyRatio/Std         │ 0.0004972382448613644  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.3849566876888275     │
│ TotalEnvSteps                 │ 122000.0               │
│ Loss/Loss_pi                  │ -0.02584567666053772   │
│ Loss/Loss_pi/Delta            │ -0.00384354405105114   │
│ Value/Adv                     │ 1.244545018153076e-07  │
│ Loss/Loss_reward_critic       │ 0.006964580621570349   │
│ Loss/Loss_reward_critic/Delta │ -0.0006282678805291653 │
│ Value/reward                  │ -0.3123151659965515    │
│ Loss/Loss_cost_critic         │ 9.34051968215499e-06   │
│ Loss/Loss_cost_critic/Delta   │ -3.293356712674722e-06 │
│ Value/cost                    │ 0.018301066011190414   │
│ Time/Total                    │ 169.39822387695312     │
│ Time/Rollout                  │ 1.5820610523223877     │
│ Time/Update                   │ 0.8978798389434814     │
│ Time/Epoch                    │ 2.47995662689209       │
│ Time/FPS                      │ 806.4660034179688      │
│ Misc/Alpha                    │ 0.5328750610351562     │
│ Misc/FinalStepNorm            │ 0.04848335683345795    │
│ Misc/gradient_norm            │ 7.16579008102417       │
│ Misc/xHx                      │ 0.07043348252773285    │
│ Misc/H_inv_g                  │ 0.11373057216405869    │
│ Misc/AcceptanceStep           │ 2.0                    │
│ Misc/cost_gradient_norm       │ 0.00015372285270132124 │
│ Misc/A                        │ 0.0                    │
│ Misc/B                        │ 0.0                    │
│ Misc/q                        │ 0.07043348252773285    │
│ Misc/r                        │ -1.437314722352312e-06 │
│ Misc/s                        │ 3.125598391218176e-11  │
│ Misc/Lambda_star              │ 1.8766125440597534     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 4.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 2237  Steps count? : 0, Cost: 1698.0
For episode num 2238  Steps count? : 100, Cost: 1698.0
For episode num 2239  Steps count? : 100, Cost: 1698.0
For episode num 2240  Steps count? : 100, Cost: 1698.0
For episode num 2241  Steps count? : 100, Cost: 1698.0
For episode num 2242  Steps count? : 100, Cost: 1698.0
For episode num 2243  Steps count? : 100, Cost: 1698.0
For episode num 2244  Steps count? : 100, Cost: 1698.0
For episode num 2245  Steps count? : 100, Cost: 1698.0
For episode num 2246  Steps count? : 100, Cost: 1698.0
For episode num 2247  Steps count? : 100, Cost: 1698.0
For episode num 2248  Steps count? : 100, Cost: 1698.0
For episode num 2249  Steps count? : 100, Cost: 1698.0
For episode num 2250  Steps count? : 100, Cost: 1698.0
For episode num 2251  Steps count? : 100, Cost: 1698.0
For episode num 2252  Steps count? : 100, Cost: 1698.0
For episode num 2253  Steps count? : 100, Cost: 1698.0
For episode num 2254  Steps count? : 100, Cost: 1698.0
For episode num 2255  Steps count? : 100, Cost: 1698.0
For episode num 2256  Steps count? : 100, Cost: 1698.0
For episode num 2257  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 61... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.037627629935741425 Actual: 0.037351034581661224
INFO: violated KL constraint 0.01019839197397232 at step 1.
Expected Improvement: 0.037627629935741425 Actual: 0.02993098646402359
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.1997024416923523    │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 61.0                   │
│ Train/Entropy                 │ 0.4557042717933655     │
│ Train/KL                      │ 0.00019086399697698653 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 0.9985554814338684     │
│ Train/PolicyRatio/Min         │ 0.9985554814338684     │
│ Train/PolicyRatio/Max         │ 0.9985554814338684     │
│ Train/PolicyRatio/Std         │ 0.0008576807449571788  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.38166072964668274    │
│ TotalEnvSteps                 │ 124000.0               │
│ Loss/Loss_pi                  │ -0.025428831577301025  │
│ Loss/Loss_pi/Delta            │ 0.00041684508323669434 │
│ Value/Adv                     │ 5.686283088834898e-08  │
│ Loss/Loss_reward_critic       │ 0.0063988082110881805  │
│ Loss/Loss_reward_critic/Delta │ -0.0005657724104821682 │
│ Value/reward                  │ -0.29542168974876404   │
│ Loss/Loss_cost_critic         │ 6.850705631222809e-06  │
│ Loss/Loss_cost_critic/Delta   │ -2.489814050932182e-06 │
│ Value/cost                    │ 0.01569053716957569    │
│ Time/Total                    │ 171.92161560058594     │
│ Time/Rollout                  │ 1.6095869541168213     │
│ Time/Update                   │ 0.8956177234649658     │
│ Time/Epoch                    │ 2.505220651626587      │
│ Time/FPS                      │ 798.3330078125         │
│ Misc/Alpha                    │ 0.5319901704788208     │
│ Misc/FinalStepNorm            │ 0.03432541340589523    │
│ Misc/gradient_norm            │ 7.318786144256592      │
│ Misc/xHx                      │ 0.07066798210144043    │
│ Misc/H_inv_g                  │ 0.08065331727266312    │
│ Misc/AcceptanceStep           │ 2.0                    │
│ Misc/cost_gradient_norm       │ 8.913935744203627e-05  │
│ Misc/A                        │ 0.0                    │
│ Misc/B                        │ 0.0                    │
│ Misc/q                        │ 0.07066798210144043    │
│ Misc/r                        │ 7.930108267828473e-07  │
│ Misc/s                        │ 1.5611761883249642e-11 │
│ Misc/Lambda_star              │ 1.879733920097351      │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 4.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 2258  Steps count? : 0, Cost: 1698.0
For episode num 2259  Steps count? : 100, Cost: 1698.0
For episode num 2260  Steps count? : 100, Cost: 1698.0
For episode num 2261  Steps count? : 100, Cost: 1698.0
For episode num 2262  Steps count? : 100, Cost: 1698.0
For episode num 2263  Steps count? : 100, Cost: 1698.0
For episode num 2264  Steps count? : 100, Cost: 1698.0
For episode num 2265  Steps count? : 100, Cost: 1698.0
For episode num 2266  Steps count? : 100, Cost: 1698.0
For episode num 2267  Steps count? : 100, Cost: 1698.0
For episode num 2268  Steps count? : 100, Cost: 1698.0
For episode num 2269  Steps count? : 100, Cost: 1698.0
For episode num 2270  Steps count? : 100, Cost: 1698.0
For episode num 2271  Steps count? : 100, Cost: 1698.0
For episode num 2272  Steps count? : 100, Cost: 1698.0
For episode num 2273  Steps count? : 100, Cost: 1698.0
For episode num 2274  Steps count? : 100, Cost: 1698.0
For episode num 2275  Steps count? : 100, Cost: 1698.0
For episode num 2276  Steps count? : 100, Cost: 1698.0
For episode num 2277  Steps count? : 100, Cost: 1698.0
For episode num 2278  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 62... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.03785019740462303 Actual: 0.03821339085698128
INFO: violated KL constraint 0.010012597776949406 at step 1.
Expected Improvement: 0.03785019740462303 Actual: 0.03050667978823185
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.20397815108299255    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 62.0                    │
│ Train/Entropy                 │ 0.4554453194141388      │
│ Train/KL                      │ 0.00018834158254321665  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0015571117401123      │
│ Train/PolicyRatio/Min         │ 1.0015571117401123      │
│ Train/PolicyRatio/Max         │ 1.0015571117401123      │
│ Train/PolicyRatio/Std         │ 0.0009241097723133862   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3815578818321228      │
│ TotalEnvSteps                 │ 126000.0                │
│ Loss/Loss_pi                  │ -0.02594675123691559    │
│ Loss/Loss_pi/Delta            │ -0.000517919659614563   │
│ Value/Adv                     │ 1.0466575872669637e-07  │
│ Loss/Loss_reward_critic       │ 0.005884801037609577    │
│ Loss/Loss_reward_critic/Delta │ -0.0005140071734786034  │
│ Value/reward                  │ -0.28131672739982605    │
│ Loss/Loss_cost_critic         │ 4.965535026713042e-06   │
│ Loss/Loss_cost_critic/Delta   │ -1.8851706045097671e-06 │
│ Value/cost                    │ 0.013459144160151482    │
│ Time/Total                    │ 174.40589904785156      │
│ Time/Rollout                  │ 1.5754129886627197      │
│ Time/Update                   │ 0.8913393020629883      │
│ Time/Epoch                    │ 2.4667656421661377      │
│ Time/FPS                      │ 810.7785034179688       │
│ Misc/Alpha                    │ 0.528415858745575       │
│ Misc/FinalStepNorm            │ 0.016701247543096542    │
│ Misc/gradient_norm            │ 7.602537631988525       │
│ Misc/xHx                      │ 0.07162724435329437     │
│ Misc/H_inv_g                  │ 0.039507824927568436    │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 2.7847083401866257e-05  │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.07162724435329437     │
│ Misc/r                        │ 2.1692812879336998e-07  │
│ Misc/s                        │ 1.4379547899623368e-12  │
│ Misc/Lambda_star              │ 1.892448902130127       │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2279  Steps count? : 0, Cost: 1698.0
For episode num 2280  Steps count? : 100, Cost: 1698.0
For episode num 2281  Steps count? : 100, Cost: 1698.0
For episode num 2282  Steps count? : 100, Cost: 1698.0
For episode num 2283  Steps count? : 100, Cost: 1698.0
For episode num 2284  Steps count? : 100, Cost: 1698.0
For episode num 2285  Steps count? : 100, Cost: 1698.0
For episode num 2286  Steps count? : 100, Cost: 1698.0
For episode num 2287  Steps count? : 100, Cost: 1698.0
For episode num 2288  Steps count? : 100, Cost: 1698.0
For episode num 2289  Steps count? : 100, Cost: 1698.0
For episode num 2290  Steps count? : 100, Cost: 1698.0
For episode num 2291  Steps count? : 100, Cost: 1698.0
For episode num 2292  Steps count? : 100, Cost: 1698.0
For episode num 2293  Steps count? : 100, Cost: 1698.0
For episode num 2294  Steps count? : 100, Cost: 1698.0
For episode num 2295  Steps count? : 100, Cost: 1698.0
For episode num 2296  Steps count? : 100, Cost: 1698.0
For episode num 2297  Steps count? : 100, Cost: 1698.0
For episode num 2298  Steps count? : 100, Cost: 1698.0
For episode num 2299  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 63... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.03958648443222046 Actual: 0.03966230899095535
INFO: violated KL constraint 0.010347639210522175 at step 1.
Expected Improvement: 0.03958648443222046 Actual: 0.03172137588262558
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.21006987988948822    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 63.0                    │
│ Train/Entropy                 │ 0.4499390721321106      │
│ Train/KL                      │ 0.00019314968085382134  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9986265897750854      │
│ Train/PolicyRatio/Min         │ 0.9986265897750854      │
│ Train/PolicyRatio/Max         │ 0.9986265897750854      │
│ Train/PolicyRatio/Std         │ 0.0008104161825031042   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3794647753238678      │
│ TotalEnvSteps                 │ 128000.0                │
│ Loss/Loss_pi                  │ -0.026965662837028503   │
│ Loss/Loss_pi/Delta            │ -0.001018911600112915   │
│ Value/Adv                     │ -1.1885165918101848e-07 │
│ Loss/Loss_reward_critic       │ 0.005418859422206879    │
│ Loss/Loss_reward_critic/Delta │ -0.0004659416154026985  │
│ Value/reward                  │ -0.2702311873435974     │
│ Loss/Loss_cost_critic         │ 3.5511184250935912e-06  │
│ Loss/Loss_cost_critic/Delta   │ -1.4144166016194504e-06 │
│ Value/cost                    │ 0.011537536978721619    │
│ Time/Total                    │ 176.89942932128906      │
│ Time/Rollout                  │ 1.5829079151153564      │
│ Time/Update                   │ 0.8930304050445557      │
│ Time/Epoch                    │ 2.475954055786133       │
│ Time/FPS                      │ 807.7696533203125       │
│ Misc/Alpha                    │ 0.5050048232078552      │
│ Misc/FinalStepNorm            │ 0.03217071294784546     │
│ Misc/gradient_norm            │ 7.885158061981201       │
│ Misc/xHx                      │ 0.0784221664071083      │
│ Misc/H_inv_g                  │ 0.07962971180677414     │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 0.0006367220194078982   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.0784221664071083      │
│ Misc/r                        │ -6.155785740702413e-06  │
│ Misc/s                        │ 4.985494794773615e-10   │
│ Misc/Lambda_star              │ 1.9801790714263916      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2300  Steps count? : 0, Cost: 1698.0
For episode num 2301  Steps count? : 100, Cost: 1698.0
For episode num 2302  Steps count? : 100, Cost: 1698.0
For episode num 2303  Steps count? : 100, Cost: 1698.0
For episode num 2304  Steps count? : 100, Cost: 1698.0
For episode num 2305  Steps count? : 100, Cost: 1698.0
For episode num 2306  Steps count? : 100, Cost: 1698.0
For episode num 2307  Steps count? : 100, Cost: 1698.0
For episode num 2308  Steps count? : 100, Cost: 1698.0
For episode num 2309  Steps count? : 100, Cost: 1698.0
For episode num 2310  Steps count? : 100, Cost: 1698.0
For episode num 2311  Steps count? : 100, Cost: 1698.0
For episode num 2312  Steps count? : 100, Cost: 1698.0
For episode num 2313  Steps count? : 100, Cost: 1698.0
For episode num 2314  Steps count? : 100, Cost: 1698.0
For episode num 2315  Steps count? : 100, Cost: 1698.0
For episode num 2316  Steps count? : 100, Cost: 1698.0
For episode num 2317  Steps count? : 100, Cost: 1698.0
For episode num 2318  Steps count? : 100, Cost: 1698.0
For episode num 2319  Steps count? : 100, Cost: 1698.0
For episode num 2320  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 64... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.0408947728574276 Actual: 0.04116617143154144
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.21618923544883728    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 64.0                    │
│ Train/Entropy                 │ 0.4517469108104706      │
│ Train/KL                      │ 0.0002929000183939934   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9984802603721619      │
│ Train/PolicyRatio/Min         │ 0.9984802603721619      │
│ Train/PolicyRatio/Max         │ 0.9984802603721619      │
│ Train/PolicyRatio/Std         │ 0.0010746322805061936   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.38015010952949524     │
│ TotalEnvSteps                 │ 130000.0                │
│ Loss/Loss_pi                  │ -0.030874930322170258   │
│ Loss/Loss_pi/Delta            │ -0.003909267485141754   │
│ Value/Adv                     │ -1.8119811429073707e-08 │
│ Loss/Loss_reward_critic       │ 0.005001507233828306    │
│ Loss/Loss_reward_critic/Delta │ -0.00041735218837857246 │
│ Value/reward                  │ -0.2604658901691437     │
│ Loss/Loss_cost_critic         │ 2.5014387574628927e-06  │
│ Loss/Loss_cost_critic/Delta   │ -1.0496796676306985e-06 │
│ Value/cost                    │ 0.00989505834877491     │
│ Time/Total                    │ 179.3901824951172       │
│ Time/Rollout                  │ 1.584749460220337       │
│ Time/Update                   │ 0.888054609298706       │
│ Time/Epoch                    │ 2.4728198051452637      │
│ Time/FPS                      │ 808.7935180664062       │
│ Misc/Alpha                    │ 0.4890776574611664      │
│ Misc/FinalStepNorm            │ 0.020657550543546677    │
│ Misc/gradient_norm            │ 8.321014404296875       │
│ Misc/xHx                      │ 0.08361309766769409     │
│ Misc/H_inv_g                  │ 0.0422377772629261      │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.0002466662845108658   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.08361309766769409     │
│ Misc/r                        │ -2.4316987037309445e-06 │
│ Misc/s                        │ 7.226212683786315e-11   │
│ Misc/Lambda_star              │ 2.0446650981903076      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2321  Steps count? : 0, Cost: 1698.0
For episode num 2322  Steps count? : 100, Cost: 1698.0
For episode num 2323  Steps count? : 100, Cost: 1698.0
For episode num 2324  Steps count? : 100, Cost: 1698.0
For episode num 2325  Steps count? : 100, Cost: 1698.0
For episode num 2326  Steps count? : 100, Cost: 1698.0
For episode num 2327  Steps count? : 100, Cost: 1698.0
For episode num 2328  Steps count? : 100, Cost: 1698.0
For episode num 2329  Steps count? : 100, Cost: 1698.0
For episode num 2330  Steps count? : 100, Cost: 1698.0
For episode num 2331  Steps count? : 100, Cost: 1698.0
For episode num 2332  Steps count? : 100, Cost: 1698.0
For episode num 2333  Steps count? : 100, Cost: 1698.0
For episode num 2334  Steps count? : 100, Cost: 1698.0
For episode num 2335  Steps count? : 100, Cost: 1698.0
For episode num 2336  Steps count? : 100, Cost: 1698.0
For episode num 2337  Steps count? : 100, Cost: 1698.0
For episode num 2338  Steps count? : 100, Cost: 1698.0
For episode num 2339  Steps count? : 100, Cost: 1698.0
For episode num 2340  Steps count? : 100, Cost: 1698.0
For episode num 2341  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 65... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.03846004232764244 Actual: 0.0388919822871685
INFO: violated KL constraint 0.010138439945876598 at step 1.
Expected Improvement: 0.03846004232764244 Actual: 0.031044092029333115
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.2132745236158371     │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 65.0                    │
│ Train/Entropy                 │ 0.4534873068332672      │
│ Train/KL                      │ 0.00019016978330910206  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0017074346542358      │
│ Train/PolicyRatio/Min         │ 1.0017074346542358      │
│ Train/PolicyRatio/Max         │ 1.0017074346542358      │
│ Train/PolicyRatio/Std         │ 0.0010100409854203463   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.38081130385398865     │
│ TotalEnvSteps                 │ 132000.0                │
│ Loss/Loss_pi                  │ -0.026404758915305138   │
│ Loss/Loss_pi/Delta            │ 0.00447017140686512     │
│ Value/Adv                     │ 7.021426995379443e-08   │
│ Loss/Loss_reward_critic       │ 0.0046150474809110165   │
│ Loss/Loss_reward_critic/Delta │ -0.00038645975291728973 │
│ Value/reward                  │ -0.2541378438472748     │
│ Loss/Loss_cost_critic         │ 1.7315892364422325e-06  │
│ Loss/Loss_cost_critic/Delta   │ -7.698495210206602e-07  │
│ Value/cost                    │ 0.008478371426463127    │
│ Time/Total                    │ 181.92926025390625      │
│ Time/Rollout                  │ 1.6038873195648193      │
│ Time/Update                   │ 0.9176287651062012      │
│ Time/Epoch                    │ 2.521540403366089       │
│ Time/FPS                      │ 793.166259765625        │
│ Misc/Alpha                    │ 0.5208372473716736      │
│ Misc/FinalStepNorm            │ 0.022910738363862038    │
│ Misc/gradient_norm            │ 7.811129093170166       │
│ Misc/xHx                      │ 0.07372688502073288     │
│ Misc/H_inv_g                  │ 0.05498535931110382     │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 0.00026525260182097554  │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.07372688502073288     │
│ Misc/r                        │ 2.448791292408714e-06   │
│ Misc/s                        │ 9.408295670310096e-11   │
│ Misc/Lambda_star              │ 1.9199855327606201      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2342  Steps count? : 0, Cost: 1698.0
For episode num 2343  Steps count? : 100, Cost: 1698.0
For episode num 2344  Steps count? : 100, Cost: 1698.0
For episode num 2345  Steps count? : 100, Cost: 1698.0
For episode num 2346  Steps count? : 100, Cost: 1698.0
For episode num 2347  Steps count? : 100, Cost: 1698.0
For episode num 2348  Steps count? : 100, Cost: 1698.0
For episode num 2349  Steps count? : 100, Cost: 1698.0
For episode num 2350  Steps count? : 100, Cost: 1698.0
For episode num 2351  Steps count? : 100, Cost: 1698.0
For episode num 2352  Steps count? : 100, Cost: 1698.0
For episode num 2353  Steps count? : 100, Cost: 1698.0
For episode num 2354  Steps count? : 100, Cost: 1698.0
For episode num 2355  Steps count? : 100, Cost: 1698.0
For episode num 2356  Steps count? : 100, Cost: 1698.0
For episode num 2357  Steps count? : 100, Cost: 1698.0
For episode num 2358  Steps count? : 100, Cost: 1698.0
For episode num 2359  Steps count? : 100, Cost: 1698.0
For episode num 2360  Steps count? : 100, Cost: 1698.0
For episode num 2361  Steps count? : 100, Cost: 1698.0
For episode num 2362  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 66... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.04061732441186905 Actual: 0.039778128266334534
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.21174722909927368   │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 66.0                   │
│ Train/Entropy                 │ 0.4511309564113617     │
│ Train/KL                      │ 0.0002841190726030618  │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 0.9990624785423279     │
│ Train/PolicyRatio/Min         │ 0.9990624785423279     │
│ Train/PolicyRatio/Max         │ 0.9990624785423279     │
│ Train/PolicyRatio/Std         │ 0.0006629418348893523  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.37991559505462646    │
│ TotalEnvSteps                 │ 134000.0               │
│ Loss/Loss_pi                  │ -0.02983342483639717   │
│ Loss/Loss_pi/Delta            │ -0.0034286659210920334 │
│ Value/Adv                     │ -4.255771557382104e-08 │
│ Loss/Loss_reward_critic       │ 0.0042541660368442535  │
│ Loss/Loss_reward_critic/Delta │ -0.0003608814440667629 │
│ Value/reward                  │ -0.24724876880645752   │
│ Loss/Loss_cost_critic         │ 1.1753703574868268e-06 │
│ Loss/Loss_cost_critic/Delta   │ -5.562188789554057e-07 │
│ Value/cost                    │ 0.007265517488121986   │
│ Time/Total                    │ 184.48434448242188     │
│ Time/Rollout                  │ 1.5940988063812256     │
│ Time/Update                   │ 0.9425990581512451     │
│ Time/Epoch                    │ 2.536714553833008      │
│ Time/FPS                      │ 788.421630859375       │
│ Misc/Alpha                    │ 0.4922853410243988     │
│ Misc/FinalStepNorm            │ 0.049828022718429565   │
│ Misc/gradient_norm            │ 8.146159172058105      │
│ Misc/xHx                      │ 0.08252701163291931    │
│ Misc/H_inv_g                  │ 0.10121776163578033    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.00018781529797706753 │
│ Misc/A                        │ 0.0                    │
│ Misc/B                        │ 0.0                    │
│ Misc/q                        │ 0.08252701163291931    │
│ Misc/r                        │ 1.7912113889906323e-06 │
│ Misc/s                        │ 4.380797113956447e-11  │
│ Misc/Lambda_star              │ 2.0313422679901123     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 4.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 2363  Steps count? : 0, Cost: 1698.0
For episode num 2364  Steps count? : 100, Cost: 1698.0
For episode num 2365  Steps count? : 100, Cost: 1698.0
For episode num 2366  Steps count? : 100, Cost: 1698.0
For episode num 2367  Steps count? : 100, Cost: 1698.0
For episode num 2368  Steps count? : 100, Cost: 1698.0
For episode num 2369  Steps count? : 100, Cost: 1698.0
For episode num 2370  Steps count? : 100, Cost: 1698.0
For episode num 2371  Steps count? : 100, Cost: 1698.0
For episode num 2372  Steps count? : 100, Cost: 1698.0
For episode num 2373  Steps count? : 100, Cost: 1698.0
For episode num 2374  Steps count? : 100, Cost: 1698.0
For episode num 2375  Steps count? : 100, Cost: 1698.0
For episode num 2376  Steps count? : 100, Cost: 1698.0
For episode num 2377  Steps count? : 100, Cost: 1698.0
For episode num 2378  Steps count? : 100, Cost: 1698.0
For episode num 2379  Steps count? : 100, Cost: 1698.0
For episode num 2380  Steps count? : 100, Cost: 1698.0
For episode num 2381  Steps count? : 100, Cost: 1698.0
For episode num 2382  Steps count? : 100, Cost: 1698.0
For episode num 2383  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 67... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.03719363361597061 Actual: 0.03761868178844452
INFO: violated KL constraint 0.010404633358120918 at step 1.
Expected Improvement: 0.03719363361597061 Actual: 0.030033433809876442
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.21961592137813568   │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 67.0                   │
│ Train/Entropy                 │ 0.44349348545074463    │
│ Train/KL                      │ 0.00019391368550714105 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 0.9940915107727051     │
│ Train/PolicyRatio/Min         │ 0.9940915107727051     │
│ Train/PolicyRatio/Max         │ 0.9940915107727051     │
│ Train/PolicyRatio/Std         │ 0.0034953178837895393  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.3770272731781006     │
│ TotalEnvSteps                 │ 136000.0               │
│ Loss/Loss_pi                  │ -0.025543753057718277  │
│ Loss/Loss_pi/Delta            │ 0.004289671778678894   │
│ Value/Adv                     │ 7.629394893626795e-09  │
│ Loss/Loss_reward_critic       │ 0.003926691133528948   │
│ Loss/Loss_reward_critic/Delta │ -0.0003274749033153057 │
│ Value/reward                  │ -0.24073803424835205   │
│ Loss/Loss_cost_critic         │ 7.840625926291978e-07  │
│ Loss/Loss_cost_critic/Delta   │ -3.913077648576291e-07 │
│ Value/cost                    │ 0.006233562715351582   │
│ Time/Total                    │ 187.04490661621094     │
│ Time/Rollout                  │ 1.5931336879730225     │
│ Time/Update                   │ 0.9497165679931641     │
│ Time/Epoch                    │ 2.5428667068481445     │
│ Time/FPS                      │ 786.5140991210938      │
│ Misc/Alpha                    │ 0.5378952622413635     │
│ Misc/FinalStepNorm            │ 0.03649160638451576    │
│ Misc/gradient_norm            │ 7.6214799880981445     │
│ Misc/xHx                      │ 0.06912490725517273    │
│ Misc/H_inv_g                  │ 0.08480183780193329    │
│ Misc/AcceptanceStep           │ 2.0                    │
│ Misc/cost_gradient_norm       │ 5.834161856910214e-05  │
│ Misc/A                        │ 0.0                    │
│ Misc/B                        │ 0.0                    │
│ Misc/q                        │ 0.06912490725517273    │
│ Misc/r                        │ 4.702133651335316e-07  │
│ Misc/s                        │ 3.667445687416393e-12  │
│ Misc/Lambda_star              │ 1.859097957611084      │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 4.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 2384  Steps count? : 0, Cost: 1698.0
For episode num 2385  Steps count? : 100, Cost: 1698.0
For episode num 2386  Steps count? : 100, Cost: 1698.0
For episode num 2387  Steps count? : 100, Cost: 1698.0
For episode num 2388  Steps count? : 100, Cost: 1698.0
For episode num 2389  Steps count? : 100, Cost: 1698.0
For episode num 2390  Steps count? : 100, Cost: 1698.0
For episode num 2391  Steps count? : 100, Cost: 1698.0
For episode num 2392  Steps count? : 100, Cost: 1698.0
For episode num 2393  Steps count? : 100, Cost: 1698.0
For episode num 2394  Steps count? : 100, Cost: 1698.0
For episode num 2395  Steps count? : 100, Cost: 1698.0
For episode num 2396  Steps count? : 100, Cost: 1698.0
For episode num 2397  Steps count? : 100, Cost: 1698.0
For episode num 2398  Steps count? : 100, Cost: 1698.0
For episode num 2399  Steps count? : 100, Cost: 1698.0
For episode num 2400  Steps count? : 100, Cost: 1698.0
For episode num 2401  Steps count? : 100, Cost: 1698.0
For episode num 2402  Steps count? : 100, Cost: 1698.0
For episode num 2403  Steps count? : 100, Cost: 1698.0
For episode num 2404  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 68... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.037478357553482056 Actual: 0.03788066655397415
INFO: violated KL constraint 0.010234731249511242 at step 1.
Expected Improvement: 0.037478357553482056 Actual: 0.03024471551179886
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.2191338688135147     │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 68.0                    │
│ Train/Entropy                 │ 0.44050922989845276     │
│ Train/KL                      │ 0.0001912486768560484   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9995657801628113      │
│ Train/PolicyRatio/Min         │ 0.9995657801628113      │
│ Train/PolicyRatio/Max         │ 0.9995657801628113      │
│ Train/PolicyRatio/Std         │ 0.000262647052295506    │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3759011924266815      │
│ TotalEnvSteps                 │ 138000.0                │
│ Loss/Loss_pi                  │ -0.02572302520275116    │
│ Loss/Loss_pi/Delta            │ -0.0001792721450328827  │
│ Value/Adv                     │ 3.8444994743258576e-08  │
│ Loss/Loss_reward_critic       │ 0.0036408838350325823   │
│ Loss/Loss_reward_critic/Delta │ -0.00028580729849636555 │
│ Value/reward                  │ -0.23713287711143494    │
│ Loss/Loss_cost_critic         │ 5.108308869239409e-07   │
│ Loss/Loss_cost_critic/Delta   │ -2.732317057052569e-07  │
│ Value/cost                    │ 0.005343177355825901    │
│ Time/Total                    │ 190.40179443359375      │
│ Time/Rollout                  │ 1.8442816734313965      │
│ Time/Update                   │ 1.4937455654144287      │
│ Time/Epoch                    │ 3.338047981262207       │
│ Time/FPS                      │ 599.15283203125         │
│ Misc/Alpha                    │ 0.5339443683624268      │
│ Misc/FinalStepNorm            │ 0.041711047291755676    │
│ Misc/gradient_norm            │ 7.804633140563965       │
│ Misc/xHx                      │ 0.07015165686607361     │
│ Misc/H_inv_g                  │ 0.09764838963747025     │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 2.4179407773772255e-05  │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.07015165686607361     │
│ Misc/r                        │ -1.6414675485521002e-07 │
│ Misc/s                        │ 5.33883430086507e-13    │
│ Misc/Lambda_star              │ 1.872854232788086       │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2405  Steps count? : 0, Cost: 1698.0
For episode num 2406  Steps count? : 100, Cost: 1698.0
For episode num 2407  Steps count? : 100, Cost: 1698.0
For episode num 2408  Steps count? : 100, Cost: 1698.0
For episode num 2409  Steps count? : 100, Cost: 1698.0
For episode num 2410  Steps count? : 100, Cost: 1698.0
For episode num 2411  Steps count? : 100, Cost: 1698.0
For episode num 2412  Steps count? : 100, Cost: 1698.0
For episode num 2413  Steps count? : 100, Cost: 1698.0
For episode num 2414  Steps count? : 100, Cost: 1698.0
For episode num 2415  Steps count? : 100, Cost: 1698.0
For episode num 2416  Steps count? : 100, Cost: 1698.0
For episode num 2417  Steps count? : 100, Cost: 1698.0
For episode num 2418  Steps count? : 100, Cost: 1698.0
For episode num 2419  Steps count? : 100, Cost: 1698.0
For episode num 2420  Steps count? : 100, Cost: 1698.0
For episode num 2421  Steps count? : 100, Cost: 1698.0
For episode num 2422  Steps count? : 100, Cost: 1698.0
For episode num 2423  Steps count? : 100, Cost: 1698.0
For episode num 2424  Steps count? : 100, Cost: 1698.0
For episode num 2425  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 69... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.038135506212711334 Actual: 0.03730716556310654
INFO: violated KL constraint 0.010665087960660458 at step 1.
Expected Improvement: 0.038135506212711334 Actual: 0.029979001730680466
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.21850402653217316    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 69.0                    │
│ Train/Entropy                 │ 0.42341354489326477     │
│ Train/KL                      │ 0.00019788651843555272  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9992602467536926      │
│ Train/PolicyRatio/Min         │ 0.9992602467536926      │
│ Train/PolicyRatio/Max         │ 0.9992602467536926      │
│ Train/PolicyRatio/Std         │ 0.00043146434472873807  │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3695475459098816      │
│ TotalEnvSteps                 │ 140000.0                │
│ Loss/Loss_pi                  │ -0.02544872835278511    │
│ Loss/Loss_pi/Delta            │ 0.0002742968499660492   │
│ Value/Adv                     │ 2.074241578498004e-08   │
│ Loss/Loss_reward_critic       │ 0.0033593401312828064   │
│ Loss/Loss_reward_critic/Delta │ -0.0002815437037497759  │
│ Value/reward                  │ -0.23280157148838043    │
│ Loss/Loss_cost_critic         │ 3.263756127580564e-07   │
│ Loss/Loss_cost_critic/Delta   │ -1.8445527416588448e-07 │
│ Value/cost                    │ 0.004589397925883532    │
│ Time/Total                    │ 193.36077880859375      │
│ Time/Rollout                  │ 1.918858289718628       │
│ Time/Update                   │ 1.0173876285552979      │
│ Time/Epoch                    │ 2.936262845993042       │
│ Time/FPS                      │ 681.1381225585938       │
│ Misc/Alpha                    │ 0.5245289206504822      │
│ Misc/FinalStepNorm            │ 0.03347254544496536     │
│ Misc/gradient_norm            │ 7.846267223358154       │
│ Misc/xHx                      │ 0.07269273698329926     │
│ Misc/H_inv_g                  │ 0.0797680914402008      │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 0.00017222229507751763  │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.07269273698329926     │
│ Misc/r                        │ 1.4356835436046822e-06  │
│ Misc/s                        │ 3.152370378622926e-11   │
│ Misc/Lambda_star              │ 1.9064725637435913      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2426  Steps count? : 0, Cost: 1698.0
For episode num 2427  Steps count? : 100, Cost: 1698.0
For episode num 2428  Steps count? : 100, Cost: 1698.0
For episode num 2429  Steps count? : 100, Cost: 1698.0
For episode num 2430  Steps count? : 100, Cost: 1698.0
For episode num 2431  Steps count? : 100, Cost: 1698.0
For episode num 2432  Steps count? : 100, Cost: 1698.0
For episode num 2433  Steps count? : 100, Cost: 1698.0
For episode num 2434  Steps count? : 100, Cost: 1698.0
For episode num 2435  Steps count? : 100, Cost: 1698.0
For episode num 2436  Steps count? : 100, Cost: 1698.0
For episode num 2437  Steps count? : 100, Cost: 1698.0
For episode num 2438  Steps count? : 100, Cost: 1698.0
For episode num 2439  Steps count? : 100, Cost: 1698.0
For episode num 2440  Steps count? : 100, Cost: 1698.0
For episode num 2441  Steps count? : 100, Cost: 1698.0
For episode num 2442  Steps count? : 100, Cost: 1698.0
For episode num 2443  Steps count? : 100, Cost: 1698.0
For episode num 2444  Steps count? : 100, Cost: 1698.0
For episode num 2445  Steps count? : 100, Cost: 1698.0
For episode num 2446  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 70... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.04422450065612793 Actual: 0.04441589489579201
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.2177538126707077    │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 70.0                   │
│ Train/Entropy                 │ 0.4232141077518463     │
│ Train/KL                      │ 0.0002897129161283374  │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.0014375448226929     │
│ Train/PolicyRatio/Min         │ 1.0014375448226929     │
│ Train/PolicyRatio/Max         │ 1.0014375448226929     │
│ Train/PolicyRatio/Std         │ 0.0010164134437218308  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.3694569766521454     │
│ TotalEnvSteps                 │ 142000.0               │
│ Loss/Loss_pi                  │ -0.03331179916858673   │
│ Loss/Loss_pi/Delta            │ -0.00786307081580162   │
│ Value/Adv                     │ 7.128715395765539e-08  │
│ Loss/Loss_reward_critic       │ 0.0031111789867281914  │
│ Loss/Loss_reward_critic/Delta │ -0.000248161144554615  │
│ Value/reward                  │ -0.23000384867191315   │
│ Loss/Loss_cost_critic         │ 2.0508652198714117e-07 │
│ Loss/Loss_cost_critic/Delta   │ -1.212890907709152e-07 │
│ Value/cost                    │ 0.00393741624429822    │
│ Time/Total                    │ 197.51412963867188     │
│ Time/Rollout                  │ 2.6205215454101562     │
│ Time/Update                   │ 1.5094316005706787     │
│ Time/Epoch                    │ 4.129975318908691      │
│ Time/FPS                      │ 484.2645568847656      │
│ Misc/Alpha                    │ 0.4524184465408325     │
│ Misc/FinalStepNorm            │ 0.016884107142686844   │
│ Misc/gradient_norm            │ 9.641519546508789      │
│ Misc/xHx                      │ 0.09771232306957245    │
│ Misc/H_inv_g                  │ 0.037319671362638474   │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 0.00017548205505590886 │
│ Misc/A                        │ 0.0                    │
│ Misc/B                        │ 0.0                    │
│ Misc/q                        │ 0.09771232306957245    │
│ Misc/r                        │ 1.7115789887611754e-06 │
│ Misc/s                        │ 3.1161795366330125e-11 │
│ Misc/Lambda_star              │ 2.2103431224823        │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 4.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 2447  Steps count? : 0, Cost: 1698.0
For episode num 2448  Steps count? : 100, Cost: 1698.0
For episode num 2449  Steps count? : 100, Cost: 1698.0
For episode num 2450  Steps count? : 100, Cost: 1698.0
For episode num 2451  Steps count? : 100, Cost: 1698.0
For episode num 2452  Steps count? : 100, Cost: 1698.0
For episode num 2453  Steps count? : 100, Cost: 1698.0
For episode num 2454  Steps count? : 100, Cost: 1698.0
For episode num 2455  Steps count? : 100, Cost: 1698.0
For episode num 2456  Steps count? : 100, Cost: 1698.0
For episode num 2457  Steps count? : 100, Cost: 1698.0
For episode num 2458  Steps count? : 100, Cost: 1698.0
For episode num 2459  Steps count? : 100, Cost: 1698.0
For episode num 2460  Steps count? : 100, Cost: 1698.0
For episode num 2461  Steps count? : 100, Cost: 1698.0
For episode num 2462  Steps count? : 100, Cost: 1698.0
For episode num 2463  Steps count? : 100, Cost: 1698.0
For episode num 2464  Steps count? : 100, Cost: 1698.0
For episode num 2465  Steps count? : 100, Cost: 1698.0
For episode num 2466  Steps count? : 100, Cost: 1698.0
For episode num 2467  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 71... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.04031791538000107 Actual: 0.04021195322275162
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.2185603380203247     │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 71.0                    │
│ Train/Entropy                 │ 0.4278656542301178      │
│ Train/KL                      │ 0.00028732666396535933  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9984338879585266      │
│ Train/PolicyRatio/Min         │ 0.9984338879585266      │
│ Train/PolicyRatio/Max         │ 0.9984338879585266      │
│ Train/PolicyRatio/Std         │ 0.001107422518543899    │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3711789846420288      │
│ TotalEnvSteps                 │ 144000.0                │
│ Loss/Loss_pi                  │ -0.030158843845129013   │
│ Loss/Loss_pi/Delta            │ 0.003152955323457718    │
│ Value/Adv                     │ -1.0848045128852846e-08 │
│ Loss/Loss_reward_critic       │ 0.0028906736988574266   │
│ Loss/Loss_reward_critic/Delta │ -0.00022050528787076473 │
│ Value/reward                  │ -0.2279772013425827     │
│ Loss/Loss_cost_critic         │ 1.2604400012605765e-07  │
│ Loss/Loss_cost_critic/Delta   │ -7.904252186108351e-08  │
│ Value/cost                    │ 0.003378770314157009    │
│ Time/Total                    │ 200.33750915527344      │
│ Time/Rollout                  │ 1.7994651794433594      │
│ Time/Update                   │ 1.0006508827209473      │
│ Time/Epoch                    │ 2.8001320362091064      │
│ Time/FPS                      │ 714.252197265625        │
│ Misc/Alpha                    │ 0.4958570897579193      │
│ Misc/FinalStepNorm            │ 0.05059105530381203     │
│ Misc/gradient_norm            │ 8.61922836303711        │
│ Misc/xHx                      │ 0.08134238421916962     │
│ Misc/H_inv_g                  │ 0.10202749073505402     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 6.804789154557511e-05   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.08134238421916962     │
│ Misc/r                        │ 5.76725597056793e-07    │
│ Misc/s                        │ 4.604211976955153e-12   │
│ Misc/Lambda_star              │ 2.016710042953491       │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2468  Steps count? : 0, Cost: 1698.0
For episode num 2469  Steps count? : 100, Cost: 1698.0
For episode num 2470  Steps count? : 100, Cost: 1698.0
For episode num 2471  Steps count? : 100, Cost: 1698.0
For episode num 2472  Steps count? : 100, Cost: 1698.0
For episode num 2473  Steps count? : 100, Cost: 1698.0
For episode num 2474  Steps count? : 100, Cost: 1698.0
For episode num 2475  Steps count? : 100, Cost: 1698.0
For episode num 2476  Steps count? : 100, Cost: 1698.0
For episode num 2477  Steps count? : 100, Cost: 1698.0
For episode num 2478  Steps count? : 100, Cost: 1698.0
For episode num 2479  Steps count? : 100, Cost: 1698.0
For episode num 2480  Steps count? : 100, Cost: 1698.0
For episode num 2481  Steps count? : 100, Cost: 1698.0
For episode num 2482  Steps count? : 100, Cost: 1698.0
For episode num 2483  Steps count? : 100, Cost: 1698.0
For episode num 2484  Steps count? : 100, Cost: 1698.0
For episode num 2485  Steps count? : 100, Cost: 1698.0
For episode num 2486  Steps count? : 100, Cost: 1698.0
For episode num 2487  Steps count? : 100, Cost: 1698.0
For episode num 2488  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 72... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.03921033814549446 Actual: 0.0383201465010643
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.22056645154953003    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 72.0                    │
│ Train/Entropy                 │ 0.42550793290138245     │
│ Train/KL                      │ 0.00028586675762198865  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9971181750297546      │
│ Train/PolicyRatio/Min         │ 0.9971181750297546      │
│ Train/PolicyRatio/Max         │ 0.9971181750297546      │
│ Train/PolicyRatio/Std         │ 0.0020377719774842262   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3703054189682007      │
│ TotalEnvSteps                 │ 146000.0                │
│ Loss/Loss_pi                  │ -0.028740227222442627   │
│ Loss/Loss_pi/Delta            │ 0.001418616622686386    │
│ Value/Adv                     │ 1.0013580187262505e-08  │
│ Loss/Loss_reward_critic       │ 0.0026750070974230766   │
│ Loss/Loss_reward_critic/Delta │ -0.00021566660143435001 │
│ Value/reward                  │ -0.22541660070419312    │
│ Loss/Loss_cost_critic         │ 7.704945659270379e-08   │
│ Loss/Loss_cost_critic/Delta   │ -4.899454353335386e-08  │
│ Value/cost                    │ 0.0029030069708824158   │
│ Time/Total                    │ 202.96096801757812      │
│ Time/Rollout                  │ 1.5992083549499512      │
│ Time/Update                   │ 1.0060703754425049      │
│ Time/Epoch                    │ 2.6052935123443604      │
│ Time/FPS                      │ 767.6680908203125       │
│ Misc/Alpha                    │ 0.5100565552711487      │
│ Misc/FinalStepNorm            │ 0.04128625616431236     │
│ Misc/gradient_norm            │ 8.559903144836426       │
│ Misc/xHx                      │ 0.0768764466047287      │
│ Misc/H_inv_g                  │ 0.0809444710612297      │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 0.00012580784095916897  │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.0768764466047287      │
│ Misc/r                        │ -1.0797376717164298e-06 │
│ Misc/s                        │ 1.5936781408432132e-11  │
│ Misc/Lambda_star              │ 1.9605668783187866      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2489  Steps count? : 0, Cost: 1698.0
For episode num 2490  Steps count? : 100, Cost: 1698.0
For episode num 2491  Steps count? : 100, Cost: 1698.0
For episode num 2492  Steps count? : 100, Cost: 1698.0
For episode num 2493  Steps count? : 100, Cost: 1698.0
For episode num 2494  Steps count? : 100, Cost: 1698.0
For episode num 2495  Steps count? : 100, Cost: 1698.0
For episode num 2496  Steps count? : 100, Cost: 1698.0
For episode num 2497  Steps count? : 100, Cost: 1698.0
For episode num 2498  Steps count? : 100, Cost: 1698.0
For episode num 2499  Steps count? : 100, Cost: 1698.0
For episode num 2500  Steps count? : 100, Cost: 1698.0
For episode num 2501  Steps count? : 100, Cost: 1698.0
For episode num 2502  Steps count? : 100, Cost: 1698.0
For episode num 2503  Steps count? : 100, Cost: 1698.0
For episode num 2504  Steps count? : 100, Cost: 1698.0
For episode num 2505  Steps count? : 100, Cost: 1698.0
For episode num 2506  Steps count? : 100, Cost: 1698.0
For episode num 2507  Steps count? : 100, Cost: 1698.0
For episode num 2508  Steps count? : 100, Cost: 1698.0
For episode num 2509  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 73... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.03819749504327774 Actual: 0.038848355412483215
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.21603086590766907    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 73.0                    │
│ Train/Entropy                 │ 0.42907190322875977     │
│ Train/KL                      │ 0.00029198735137470067  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0016165971755981      │
│ Train/PolicyRatio/Min         │ 1.0016165971755981      │
│ Train/PolicyRatio/Max         │ 1.0016165971755981      │
│ Train/PolicyRatio/Std         │ 0.0011431348975747824   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.371628999710083       │
│ TotalEnvSteps                 │ 148000.0                │
│ Loss/Loss_pi                  │ -0.029136210680007935   │
│ Loss/Loss_pi/Delta            │ -0.0003959834575653076  │
│ Value/Adv                     │ 2.3365020140886372e-08  │
│ Loss/Loss_reward_critic       │ 0.002473684260621667    │
│ Loss/Loss_reward_critic/Delta │ -0.00020132283680140972 │
│ Value/reward                  │ -0.22313162684440613    │
│ Loss/Loss_cost_critic         │ 4.646633300353642e-08   │
│ Loss/Loss_cost_critic/Delta   │ -3.058312358916737e-08  │
│ Value/cost                    │ 0.0024932767264544964   │
│ Time/Total                    │ 205.88375854492188      │
│ Time/Rollout                  │ 1.6013307571411133      │
│ Time/Update                   │ 1.3030779361724854      │
│ Time/Epoch                    │ 2.9044249057769775      │
│ Time/FPS                      │ 688.6046752929688       │
│ Misc/Alpha                    │ 0.5234333872795105      │
│ Misc/FinalStepNorm            │ 0.023581603541970253    │
│ Misc/gradient_norm            │ 8.336539268493652       │
│ Misc/xHx                      │ 0.07299736142158508     │
│ Misc/H_inv_g                  │ 0.04505177214741707     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 7.911051216069609e-05   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.07299736142158508     │
│ Misc/r                        │ 6.40315079181164e-07    │
│ Misc/s                        │ 6.086678903949316e-12   │
│ Misc/Lambda_star              │ 1.910462737083435       │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2510  Steps count? : 0, Cost: 1698.0
For episode num 2511  Steps count? : 100, Cost: 1698.0
For episode num 2512  Steps count? : 100, Cost: 1698.0
For episode num 2513  Steps count? : 100, Cost: 1698.0
For episode num 2514  Steps count? : 100, Cost: 1698.0
For episode num 2515  Steps count? : 100, Cost: 1698.0
For episode num 2516  Steps count? : 100, Cost: 1698.0
For episode num 2517  Steps count? : 100, Cost: 1698.0
For episode num 2518  Steps count? : 100, Cost: 1698.0
For episode num 2519  Steps count? : 100, Cost: 1698.0
For episode num 2520  Steps count? : 100, Cost: 1698.0
For episode num 2521  Steps count? : 100, Cost: 1698.0
For episode num 2522  Steps count? : 100, Cost: 1698.0
For episode num 2523  Steps count? : 100, Cost: 1698.0
For episode num 2524  Steps count? : 100, Cost: 1698.0
For episode num 2525  Steps count? : 100, Cost: 1698.0
For episode num 2526  Steps count? : 100, Cost: 1698.0
For episode num 2527  Steps count? : 100, Cost: 1698.0
For episode num 2528  Steps count? : 100, Cost: 1698.0
For episode num 2529  Steps count? : 100, Cost: 1698.0
For episode num 2530  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 74... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.03868155926465988 Actual: 0.03817959874868393
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.2044873833656311     │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 74.0                    │
│ Train/Entropy                 │ 0.42733097076416016     │
│ Train/KL                      │ 0.0002939131809398532   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0005611181259155      │
│ Train/PolicyRatio/Min         │ 1.0005611181259155      │
│ Train/PolicyRatio/Max         │ 1.0005611181259155      │
│ Train/PolicyRatio/Std         │ 0.0003967423108406365   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.37098169326782227     │
│ TotalEnvSteps                 │ 150000.0                │
│ Loss/Loss_pi                  │ -0.028634589165449142   │
│ Loss/Loss_pi/Delta            │ 0.0005016215145587921   │
│ Value/Adv                     │ -1.4781951662712345e-08 │
│ Loss/Loss_reward_critic       │ 0.0022904910147190094   │
│ Loss/Loss_reward_critic/Delta │ -0.0001831932459026575  │
│ Value/reward                  │ -0.21976812183856964    │
│ Loss/Loss_cost_critic         │ 2.8173101185302585e-08  │
│ Loss/Loss_cost_critic/Delta   │ -1.8293231818233835e-08 │
│ Value/cost                    │ 0.0021387471351772547   │
│ Time/Total                    │ 209.40345764160156      │
│ Time/Rollout                  │ 2.1783430576324463      │
│ Time/Update                   │ 1.3189942836761475      │
│ Time/Epoch                    │ 3.4973554611206055      │
│ Time/FPS                      │ 571.86083984375         │
│ Misc/Alpha                    │ 0.5171899795532227      │
│ Misc/FinalStepNorm            │ 0.02443188987672329     │
│ Misc/gradient_norm            │ 8.429101943969727       │
│ Misc/xHx                      │ 0.07477040588855743     │
│ Misc/H_inv_g                  │ 0.04723968729376793     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 7.899094634922221e-05   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.07477040588855743     │
│ Misc/r                        │ 6.5917822666961e-07     │
│ Misc/s                        │ 6.235526851805506e-12   │
│ Misc/Lambda_star              │ 1.9335254430770874      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2531  Steps count? : 0, Cost: 1698.0
For episode num 2532  Steps count? : 100, Cost: 1698.0
For episode num 2533  Steps count? : 100, Cost: 1698.0
For episode num 2534  Steps count? : 100, Cost: 1698.0
For episode num 2535  Steps count? : 100, Cost: 1698.0
For episode num 2536  Steps count? : 100, Cost: 1698.0
For episode num 2537  Steps count? : 100, Cost: 1698.0
For episode num 2538  Steps count? : 100, Cost: 1698.0
For episode num 2539  Steps count? : 100, Cost: 1698.0
For episode num 2540  Steps count? : 100, Cost: 1698.0
For episode num 2541  Steps count? : 100, Cost: 1698.0
For episode num 2542  Steps count? : 100, Cost: 1698.0
For episode num 2543  Steps count? : 100, Cost: 1698.0
For episode num 2544  Steps count? : 100, Cost: 1698.0
For episode num 2545  Steps count? : 100, Cost: 1698.0
For episode num 2546  Steps count? : 100, Cost: 1698.0
For episode num 2547  Steps count? : 100, Cost: 1698.0
For episode num 2548  Steps count? : 100, Cost: 1698.0
For episode num 2549  Steps count? : 100, Cost: 1698.0
For episode num 2550  Steps count? : 100, Cost: 1698.0
For episode num 2551  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 75... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.041994012892246246 Actual: 0.042070914059877396
INFO: violated KL constraint 0.010182886384427547 at step 1.
Expected Improvement: 0.041994012892246246 Actual: 0.03364662826061249
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.2036658525466919     │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 75.0                    │
│ Train/Entropy                 │ 0.4223569929599762      │
│ Train/KL                      │ 0.00019091459398623556  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.996344268321991       │
│ Train/PolicyRatio/Min         │ 0.996344268321991       │
│ Train/PolicyRatio/Max         │ 0.996344268321991       │
│ Train/PolicyRatio/Std         │ 0.0021507188212126493   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3691396415233612      │
│ TotalEnvSteps                 │ 152000.0                │
│ Loss/Loss_pi                  │ -0.02860218659043312    │
│ Loss/Loss_pi/Delta            │ 3.240257501602173e-05   │
│ Value/Adv                     │ -6.377697037152075e-09  │
│ Loss/Loss_reward_critic       │ 0.0021356905344873667   │
│ Loss/Loss_reward_critic/Delta │ -0.00015480048023164272 │
│ Value/reward                  │ -0.2164129912853241     │
│ Loss/Loss_cost_critic         │ 1.7322840051292587e-08  │
│ Loss/Loss_cost_critic/Delta   │ -1.0850261134009997e-08 │
│ Value/cost                    │ 0.0018400570843368769   │
│ Time/Total                    │ 213.06808471679688      │
│ Time/Rollout                  │ 2.6216681003570557      │
│ Time/Update                   │ 1.0205349922180176      │
│ Time/Epoch                    │ 3.642218828201294       │
│ Time/FPS                      │ 549.1159057617188       │
│ Misc/Alpha                    │ 0.4759935438632965      │
│ Misc/FinalStepNorm            │ 0.015864986926317215    │
│ Misc/gradient_norm            │ 9.174099922180176       │
│ Misc/xHx                      │ 0.08827298879623413     │
│ Misc/H_inv_g                  │ 0.04166282340884209     │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 1.9218881789129227e-05  │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.08827298879623413     │
│ Misc/r                        │ -1.3128550335750333e-07 │
│ Misc/s                        │ 2.7810067616818035e-13  │
│ Misc/Lambda_star              │ 2.1008689403533936      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2552  Steps count? : 0, Cost: 1698.0
For episode num 2553  Steps count? : 100, Cost: 1698.0
For episode num 2554  Steps count? : 100, Cost: 1698.0
For episode num 2555  Steps count? : 100, Cost: 1698.0
For episode num 2556  Steps count? : 100, Cost: 1698.0
For episode num 2557  Steps count? : 100, Cost: 1698.0
For episode num 2558  Steps count? : 100, Cost: 1698.0
For episode num 2559  Steps count? : 100, Cost: 1698.0
For episode num 2560  Steps count? : 100, Cost: 1698.0
For episode num 2561  Steps count? : 100, Cost: 1698.0
For episode num 2562  Steps count? : 100, Cost: 1698.0
For episode num 2563  Steps count? : 100, Cost: 1698.0
For episode num 2564  Steps count? : 100, Cost: 1698.0
For episode num 2565  Steps count? : 100, Cost: 1698.0
For episode num 2566  Steps count? : 100, Cost: 1698.0
For episode num 2567  Steps count? : 100, Cost: 1698.0
For episode num 2568  Steps count? : 100, Cost: 1698.0
For episode num 2569  Steps count? : 100, Cost: 1698.0
For episode num 2570  Steps count? : 100, Cost: 1698.0
For episode num 2571  Steps count? : 100, Cost: 1698.0
For episode num 2572  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 76... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.041027896106243134 Actual: 0.04111899062991142
INFO: violated KL constraint 0.010136811062693596 at step 1.
Expected Improvement: 0.041027896106243134 Actual: 0.03288213536143303
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.20589523017406464    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 76.0                    │
│ Train/Entropy                 │ 0.41710421442985535     │
│ Train/KL                      │ 0.00018960358283948153  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9969422221183777      │
│ Train/PolicyRatio/Min         │ 0.9969422221183777      │
│ Train/PolicyRatio/Max         │ 0.9969422221183777      │
│ Train/PolicyRatio/Std         │ 0.0018070051446557045   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3672066330909729      │
│ TotalEnvSteps                 │ 154000.0                │
│ Loss/Loss_pi                  │ -0.027953093871474266   │
│ Loss/Loss_pi/Delta            │ 0.0006490927189588547   │
│ Value/Adv                     │ 2.9325486039510906e-08  │
│ Loss/Loss_reward_critic       │ 0.001964448019862175    │
│ Loss/Loss_reward_critic/Delta │ -0.0001712425146251917  │
│ Value/reward                  │ -0.2150779366493225     │
│ Loss/Loss_cost_critic         │ 1.0973528752344919e-08  │
│ Loss/Loss_cost_critic/Delta   │ -6.3493112989476685e-09 │
│ Value/cost                    │ 0.0015763513511046767   │
│ Time/Total                    │ 215.71336364746094      │
│ Time/Rollout                  │ 1.5959572792053223      │
│ Time/Update                   │ 1.0313491821289062      │
│ Time/Epoch                    │ 2.6273202896118164      │
│ Time/FPS                      │ 761.2320556640625       │
│ Misc/Alpha                    │ 0.4873432219028473      │
│ Misc/FinalStepNorm            │ 0.04813680797815323     │
│ Misc/gradient_norm            │ 8.842710494995117       │
│ Misc/xHx                      │ 0.08420930802822113     │
│ Misc/H_inv_g                  │ 0.12346743047237396     │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 2.4149161617970094e-05  │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.08420930802822113     │
│ Misc/r                        │ 1.6988703066544986e-07  │
│ Misc/s                        │ 4.68098162228342e-13    │
│ Misc/Lambda_star              │ 2.0519418716430664      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2573  Steps count? : 0, Cost: 1698.0
For episode num 2574  Steps count? : 100, Cost: 1698.0
For episode num 2575  Steps count? : 100, Cost: 1698.0
For episode num 2576  Steps count? : 100, Cost: 1698.0
For episode num 2577  Steps count? : 100, Cost: 1698.0
For episode num 2578  Steps count? : 100, Cost: 1698.0
For episode num 2579  Steps count? : 100, Cost: 1698.0
For episode num 2580  Steps count? : 100, Cost: 1698.0
For episode num 2581  Steps count? : 100, Cost: 1698.0
For episode num 2582  Steps count? : 100, Cost: 1698.0
For episode num 2583  Steps count? : 100, Cost: 1698.0
For episode num 2584  Steps count? : 100, Cost: 1698.0
For episode num 2585  Steps count? : 100, Cost: 1698.0
For episode num 2586  Steps count? : 100, Cost: 1698.0
For episode num 2587  Steps count? : 100, Cost: 1698.0
For episode num 2588  Steps count? : 100, Cost: 1698.0
For episode num 2589  Steps count? : 100, Cost: 1698.0
For episode num 2590  Steps count? : 100, Cost: 1698.0
For episode num 2591  Steps count? : 100, Cost: 1698.0
For episode num 2592  Steps count? : 100, Cost: 1698.0
For episode num 2593  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 77... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.04229862242937088 Actual: 0.042055368423461914
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.2043817937374115     │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 77.0                    │
│ Train/Entropy                 │ 0.41440606117248535     │
│ Train/KL                      │ 0.0002930451591964811   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0017493963241577      │
│ Train/PolicyRatio/Min         │ 1.0017493963241577      │
│ Train/PolicyRatio/Max         │ 1.0017493963241577      │
│ Train/PolicyRatio/Std         │ 0.0012369257165119052   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3662160634994507      │
│ TotalEnvSteps                 │ 156000.0                │
│ Loss/Loss_pi                  │ -0.031541451811790466   │
│ Loss/Loss_pi/Delta            │ -0.0035883579403162003  │
│ Value/Adv                     │ -2.956390332542469e-08  │
│ Loss/Loss_reward_critic       │ 0.001819543307647109    │
│ Loss/Loss_reward_critic/Delta │ -0.00014490471221506596 │
│ Value/reward                  │ -0.21328260004520416    │
│ Loss/Loss_cost_critic         │ 7.094183462896808e-09   │
│ Loss/Loss_cost_critic/Delta   │ -3.879345289448111e-09  │
│ Value/cost                    │ 0.00134936417452991     │
│ Time/Total                    │ 218.34225463867188      │
│ Time/Rollout                  │ 1.5899505615234375      │
│ Time/Update                   │ 1.0212607383728027      │
│ Time/Epoch                    │ 2.6112279891967773      │
│ Time/FPS                      │ 765.9234008789062       │
│ Misc/Alpha                    │ 0.4725402295589447      │
│ Misc/FinalStepNorm            │ 0.012737266719341278    │
│ Misc/gradient_norm            │ 9.492461204528809       │
│ Misc/xHx                      │ 0.08956789970397949     │
│ Misc/H_inv_g                  │ 0.026954883709549904    │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 1.237305059476057e-05   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.08956789970397949     │
│ Misc/r                        │ 6.816895847805426e-08   │
│ Misc/s                        │ 9.075827247942772e-14   │
│ Misc/Lambda_star              │ 2.1162219047546387      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2594  Steps count? : 0, Cost: 1698.0
For episode num 2595  Steps count? : 100, Cost: 1698.0
For episode num 2596  Steps count? : 100, Cost: 1698.0
For episode num 2597  Steps count? : 100, Cost: 1698.0
For episode num 2598  Steps count? : 100, Cost: 1698.0
For episode num 2599  Steps count? : 100, Cost: 1698.0
For episode num 2600  Steps count? : 100, Cost: 1698.0
For episode num 2601  Steps count? : 100, Cost: 1698.0
For episode num 2602  Steps count? : 100, Cost: 1698.0
For episode num 2603  Steps count? : 100, Cost: 1698.0
For episode num 2604  Steps count? : 100, Cost: 1698.0
For episode num 2605  Steps count? : 100, Cost: 1698.0
For episode num 2606  Steps count? : 100, Cost: 1698.0
For episode num 2607  Steps count? : 100, Cost: 1698.0
For episode num 2608  Steps count? : 100, Cost: 1698.0
For episode num 2609  Steps count? : 100, Cost: 1698.0
For episode num 2610  Steps count? : 100, Cost: 1698.0
For episode num 2611  Steps count? : 100, Cost: 1698.0
For episode num 2612  Steps count? : 100, Cost: 1698.0
For episode num 2613  Steps count? : 100, Cost: 1698.0
For episode num 2614  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 78... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.042200855910778046 Actual: 0.04269833490252495
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.2031484991312027     │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 78.0                    │
│ Train/Entropy                 │ 0.41788434982299805     │
│ Train/KL                      │ 0.00029175891540944576  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9951381087303162      │
│ Train/PolicyRatio/Min         │ 0.9951381087303162      │
│ Train/PolicyRatio/Max         │ 0.9951381087303162      │
│ Train/PolicyRatio/Std         │ 0.003437890438362956    │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.36749351024627686     │
│ TotalEnvSteps                 │ 158000.0                │
│ Loss/Loss_pi                  │ -0.03202376142144203    │
│ Loss/Loss_pi/Delta            │ -0.00048230960965156555 │
│ Value/Adv                     │ -2.312660285497259e-08  │
│ Loss/Loss_reward_critic       │ 0.0016817020950838923   │
│ Loss/Loss_reward_critic/Delta │ -0.00013784121256321669 │
│ Value/reward                  │ -0.2095181792974472     │
│ Loss/Loss_cost_critic         │ 4.6752517413040096e-09  │
│ Loss/Loss_cost_critic/Delta   │ -2.4189317215927986e-09 │
│ Value/cost                    │ 0.0011562301078811288   │
│ Time/Total                    │ 220.9878387451172       │
│ Time/Rollout                  │ 1.5939879417419434      │
│ Time/Update                   │ 1.0339844226837158      │
│ Time/Epoch                    │ 2.6279942989349365      │
│ Time/FPS                      │ 761.036865234375        │
│ Misc/Alpha                    │ 0.47445276379585266     │
│ Misc/FinalStepNorm            │ 0.036365024745464325    │
│ Misc/gradient_norm            │ 9.300676345825195       │
│ Misc/xHx                      │ 0.08884724974632263     │
│ Misc/H_inv_g                  │ 0.07664623856544495     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 1.1875237760250457e-05  │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.08884724974632263     │
│ Misc/r                        │ -6.242309780191135e-08  │
│ Misc/s                        │ 8.035351626699966e-14   │
│ Misc/Lambda_star              │ 2.1076912879943848      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2615  Steps count? : 0, Cost: 1698.0
For episode num 2616  Steps count? : 100, Cost: 1698.0
For episode num 2617  Steps count? : 100, Cost: 1698.0
For episode num 2618  Steps count? : 100, Cost: 1698.0
For episode num 2619  Steps count? : 100, Cost: 1698.0
For episode num 2620  Steps count? : 100, Cost: 1698.0
For episode num 2621  Steps count? : 100, Cost: 1698.0
For episode num 2622  Steps count? : 100, Cost: 1698.0
For episode num 2623  Steps count? : 100, Cost: 1698.0
For episode num 2624  Steps count? : 100, Cost: 1698.0
For episode num 2625  Steps count? : 100, Cost: 1698.0
For episode num 2626  Steps count? : 100, Cost: 1698.0
For episode num 2627  Steps count? : 100, Cost: 1698.0
For episode num 2628  Steps count? : 100, Cost: 1698.0
For episode num 2629  Steps count? : 100, Cost: 1698.0
For episode num 2630  Steps count? : 100, Cost: 1698.0
For episode num 2631  Steps count? : 100, Cost: 1698.0
For episode num 2632  Steps count? : 100, Cost: 1698.0
For episode num 2633  Steps count? : 100, Cost: 1698.0
For episode num 2634  Steps count? : 100, Cost: 1698.0
For episode num 2635  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 79... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.03840078413486481 Actual: 0.03891971707344055
INFO: violated KL constraint 0.0102419164031744 at step 1.
Expected Improvement: 0.03840078413486481 Actual: 0.0310550257563591
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.20007385313510895    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 79.0                    │
│ Train/Entropy                 │ 0.4182061553001404      │
│ Train/KL                      │ 0.00019168878498021513  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9964461326599121      │
│ Train/PolicyRatio/Min         │ 0.9964461326599121      │
│ Train/PolicyRatio/Max         │ 0.9964461326599121      │
│ Train/PolicyRatio/Std         │ 0.0021067396737635136   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3676103353500366      │
│ TotalEnvSteps                 │ 160000.0                │
│ Loss/Loss_pi                  │ -0.026416921988129616   │
│ Loss/Loss_pi/Delta            │ 0.005606839433312416    │
│ Value/Adv                     │ -1.7046929201569583e-08 │
│ Loss/Loss_reward_critic       │ 0.0015416693640872836   │
│ Loss/Loss_reward_critic/Delta │ -0.00014003273099660873 │
│ Value/reward                  │ -0.20708690583705902    │
│ Loss/Loss_cost_critic         │ 3.2533842286852632e-09  │
│ Loss/Loss_cost_critic/Delta   │ -1.4218675126187463e-09 │
│ Value/cost                    │ 0.0009957627626135945   │
│ Time/Total                    │ 223.62557983398438      │
│ Time/Rollout                  │ 1.5945186614990234      │
│ Time/Update                   │ 1.0243604183197021      │
│ Time/Epoch                    │ 2.618893623352051       │
│ Time/FPS                      │ 763.6814575195312       │
│ Misc/Alpha                    │ 0.5204752683639526      │
│ Misc/FinalStepNorm            │ 0.02596835047006607     │
│ Misc/gradient_norm            │ 8.566394805908203       │
│ Misc/xHx                      │ 0.07382946461439133     │
│ Misc/H_inv_g                  │ 0.06236691772937775     │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 2.8215588372404454e-06  │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.07382946461439133     │
│ Misc/r                        │ -5.024999860658852e-10  │
│ Misc/s                        │ 9.449442384845035e-16   │
│ Misc/Lambda_star              │ 1.921320915222168       │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2636  Steps count? : 0, Cost: 1698.0
For episode num 2637  Steps count? : 100, Cost: 1698.0
For episode num 2638  Steps count? : 100, Cost: 1698.0
For episode num 2639  Steps count? : 100, Cost: 1698.0
For episode num 2640  Steps count? : 100, Cost: 1698.0
For episode num 2641  Steps count? : 100, Cost: 1698.0
For episode num 2642  Steps count? : 100, Cost: 1698.0
For episode num 2643  Steps count? : 100, Cost: 1698.0
For episode num 2644  Steps count? : 100, Cost: 1698.0
For episode num 2645  Steps count? : 100, Cost: 1698.0
For episode num 2646  Steps count? : 100, Cost: 1698.0
For episode num 2647  Steps count? : 100, Cost: 1698.0
For episode num 2648  Steps count? : 100, Cost: 1698.0
For episode num 2649  Steps count? : 100, Cost: 1698.0
For episode num 2650  Steps count? : 100, Cost: 1698.0
For episode num 2651  Steps count? : 100, Cost: 1698.0
For episode num 2652  Steps count? : 100, Cost: 1698.0
For episode num 2653  Steps count? : 100, Cost: 1698.0
For episode num 2654  Steps count? : 100, Cost: 1698.0
For episode num 2655  Steps count? : 100, Cost: 1698.0
For episode num 2656  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 80... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.04291733354330063 Actual: 0.04316314682364464
INFO: violated KL constraint 0.010366751812398434 at step 1.
Expected Improvement: 0.04291733354330063 Actual: 0.03449515998363495
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.19283494353294373    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 80.0                    │
│ Train/Entropy                 │ 0.4125237464904785      │
│ Train/KL                      │ 0.0001934766478370875   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0001542568206787      │
│ Train/PolicyRatio/Min         │ 1.0001542568206787      │
│ Train/PolicyRatio/Max         │ 1.0001542568206787      │
│ Train/PolicyRatio/Std         │ 0.00010045810631709173  │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3655288815498352      │
│ TotalEnvSteps                 │ 162000.0                │
│ Loss/Loss_pi                  │ -0.029329786077141762   │
│ Loss/Loss_pi/Delta            │ -0.002912864089012146   │
│ Value/Adv                     │ 3.5643576978827696e-08  │
│ Loss/Loss_reward_critic       │ 0.00142297416459769     │
│ Loss/Loss_reward_critic/Delta │ -0.0001186951994895935  │
│ Value/reward                  │ -0.20423492789268494    │
│ Loss/Loss_cost_critic         │ 2.3427833006905985e-09  │
│ Loss/Loss_cost_critic/Delta   │ -9.106009279946647e-10  │
│ Value/cost                    │ 0.0008551282808184624   │
│ Time/Total                    │ 226.27760314941406      │
│ Time/Rollout                  │ 1.5950756072998047      │
│ Time/Update                   │ 1.0388400554656982      │
│ Time/Epoch                    │ 2.6339292526245117      │
│ Time/FPS                      │ 759.322021484375        │
│ Misc/Alpha                    │ 0.4656652808189392      │
│ Misc/FinalStepNorm            │ 0.030706174671649933    │
│ Misc/gradient_norm            │ 9.58568000793457        │
│ Misc/xHx                      │ 0.09223213791847229     │
│ Misc/H_inv_g                  │ 0.0824255496263504      │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 3.895847839885391e-05   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.09223213791847229     │
│ Misc/r                        │ -3.1318674587055284e-07 │
│ Misc/s                        │ 1.2734559500654496e-12  │
│ Misc/Lambda_star              │ 2.147465229034424       │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2657  Steps count? : 0, Cost: 1698.0
For episode num 2658  Steps count? : 100, Cost: 1698.0
For episode num 2659  Steps count? : 100, Cost: 1698.0
For episode num 2660  Steps count? : 100, Cost: 1698.0
For episode num 2661  Steps count? : 100, Cost: 1698.0
For episode num 2662  Steps count? : 100, Cost: 1698.0
For episode num 2663  Steps count? : 100, Cost: 1698.0
For episode num 2664  Steps count? : 100, Cost: 1698.0
For episode num 2665  Steps count? : 100, Cost: 1698.0
For episode num 2666  Steps count? : 100, Cost: 1698.0
For episode num 2667  Steps count? : 100, Cost: 1698.0
For episode num 2668  Steps count? : 100, Cost: 1698.0
For episode num 2669  Steps count? : 100, Cost: 1698.0
For episode num 2670  Steps count? : 100, Cost: 1698.0
For episode num 2671  Steps count? : 100, Cost: 1698.0
For episode num 2672  Steps count? : 100, Cost: 1698.0
For episode num 2673  Steps count? : 100, Cost: 1698.0
For episode num 2674  Steps count? : 100, Cost: 1698.0
For episode num 2675  Steps count? : 100, Cost: 1698.0
For episode num 2676  Steps count? : 100, Cost: 1698.0
For episode num 2677  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 81... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.03736034780740738 Actual: 0.03788740187883377
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.18348060548305511    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 81.0                    │
│ Train/Entropy                 │ 0.41669926047325134     │
│ Train/KL                      │ 0.0002918945683632046   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0033663511276245      │
│ Train/PolicyRatio/Min         │ 1.0033663511276245      │
│ Train/PolicyRatio/Max         │ 1.0033663511276245      │
│ Train/PolicyRatio/Std         │ 0.0023803417570888996   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.36705923080444336     │
│ TotalEnvSteps                 │ 164000.0                │
│ Loss/Loss_pi                  │ -0.028415480628609657   │
│ Loss/Loss_pi/Delta            │ 0.0009143054485321045   │
│ Value/Adv                     │ -2.1457672971791908e-08 │
│ Loss/Loss_reward_critic       │ 0.001302744960412383    │
│ Loss/Loss_reward_critic/Delta │ -0.00012022920418530703 │
│ Value/reward                  │ -0.20021691918373108    │
│ Loss/Loss_cost_critic         │ 1.6562328175595553e-09  │
│ Loss/Loss_cost_critic/Delta   │ -6.865504831310432e-10  │
│ Value/cost                    │ 0.0007314442773349583   │
│ Time/Total                    │ 229.05209350585938      │
│ Time/Rollout                  │ 1.5978131294250488      │
│ Time/Update                   │ 1.1589937210083008      │
│ Time/Epoch                    │ 2.7568235397338867      │
│ Time/FPS                      │ 725.472900390625        │
│ Misc/Alpha                    │ 0.5354676246643066      │
│ Misc/FinalStepNorm            │ 0.0466788075864315      │
│ Misc/gradient_norm            │ 8.350279808044434       │
│ Misc/xHx                      │ 0.06975309550762177     │
│ Misc/H_inv_g                  │ 0.08717391639947891     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 2.5476863811491057e-05  │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.06975309550762177     │
│ Misc/r                        │ 1.5954617538227467e-07  │
│ Misc/s                        │ 4.886594601183347e-13   │
│ Misc/Lambda_star              │ 1.8675265312194824      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2678  Steps count? : 0, Cost: 1698.0
For episode num 2679  Steps count? : 100, Cost: 1698.0
For episode num 2680  Steps count? : 100, Cost: 1698.0
For episode num 2681  Steps count? : 100, Cost: 1698.0
For episode num 2682  Steps count? : 100, Cost: 1698.0
For episode num 2683  Steps count? : 100, Cost: 1698.0
For episode num 2684  Steps count? : 100, Cost: 1698.0
For episode num 2685  Steps count? : 100, Cost: 1698.0
For episode num 2686  Steps count? : 100, Cost: 1698.0
For episode num 2687  Steps count? : 100, Cost: 1698.0
For episode num 2688  Steps count? : 100, Cost: 1698.0
For episode num 2689  Steps count? : 100, Cost: 1698.0
For episode num 2690  Steps count? : 100, Cost: 1698.0
For episode num 2691  Steps count? : 100, Cost: 1698.0
For episode num 2692  Steps count? : 100, Cost: 1698.0
For episode num 2693  Steps count? : 100, Cost: 1698.0
For episode num 2694  Steps count? : 100, Cost: 1698.0
For episode num 2695  Steps count? : 100, Cost: 1698.0
For episode num 2696  Steps count? : 100, Cost: 1698.0
For episode num 2697  Steps count? : 100, Cost: 1698.0
For episode num 2698  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 82... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.03949322551488876 Actual: 0.03980487212538719
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.1808977872133255     │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 82.0                    │
│ Train/Entropy                 │ 0.4234252870082855      │
│ Train/KL                      │ 0.0002916274534072727   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0024133920669556      │
│ Train/PolicyRatio/Min         │ 1.0024133920669556      │
│ Train/PolicyRatio/Max         │ 1.0024133920669556      │
│ Train/PolicyRatio/Std         │ 0.001706553972326219    │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3695352077484131      │
│ TotalEnvSteps                 │ 166000.0                │
│ Loss/Loss_pi                  │ -0.029853643849492073   │
│ Loss/Loss_pi/Delta            │ -0.0014381632208824158  │
│ Value/Adv                     │ -1.3828277189986693e-08 │
│ Loss/Loss_reward_critic       │ 0.0011910494649782777   │
│ Loss/Loss_reward_critic/Delta │ -0.0001116954954341054  │
│ Value/reward                  │ -0.19497068226337433    │
│ Loss/Loss_cost_critic         │ 1.1964546020593048e-09  │
│ Loss/Loss_cost_critic/Delta   │ -4.5977821550025055e-10 │
│ Value/cost                    │ 0.0006238922360353172   │
│ Time/Total                    │ 232.69825744628906      │
│ Time/Rollout                  │ 2.312267541885376       │
│ Time/Update                   │ 1.3117156028747559      │
│ Time/Epoch                    │ 3.6240017414093018      │
│ Time/FPS                      │ 551.8762817382812       │
│ Misc/Alpha                    │ 0.5070371031761169      │
│ Misc/FinalStepNorm            │ 0.020921997725963593    │
│ Misc/gradient_norm            │ 8.983707427978516       │
│ Misc/xHx                      │ 0.07779479026794434     │
│ Misc/H_inv_g                  │ 0.04126325249671936     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 1.757850441208575e-05   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.07779479026794434     │
│ Misc/r                        │ 1.060410212971874e-07   │
│ Misc/s                        │ 2.084987456123233e-13   │
│ Misc/Lambda_star              │ 1.9722422361373901      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2699  Steps count? : 0, Cost: 1698.0
For episode num 2700  Steps count? : 100, Cost: 1698.0
For episode num 2701  Steps count? : 100, Cost: 1698.0
For episode num 2702  Steps count? : 100, Cost: 1698.0
For episode num 2703  Steps count? : 100, Cost: 1698.0
For episode num 2704  Steps count? : 100, Cost: 1698.0
For episode num 2705  Steps count? : 100, Cost: 1698.0
For episode num 2706  Steps count? : 100, Cost: 1698.0
For episode num 2707  Steps count? : 100, Cost: 1698.0
For episode num 2708  Steps count? : 100, Cost: 1698.0
For episode num 2709  Steps count? : 100, Cost: 1698.0
For episode num 2710  Steps count? : 100, Cost: 1698.0
For episode num 2711  Steps count? : 100, Cost: 1698.0
For episode num 2712  Steps count? : 100, Cost: 1698.0
For episode num 2713  Steps count? : 100, Cost: 1698.0
For episode num 2714  Steps count? : 100, Cost: 1698.0
For episode num 2715  Steps count? : 100, Cost: 1698.0
For episode num 2716  Steps count? : 100, Cost: 1698.0
For episode num 2717  Steps count? : 100, Cost: 1698.0
For episode num 2718  Steps count? : 100, Cost: 1698.0
For episode num 2719  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 83... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.04235093295574188 Actual: 0.04239504411816597
INFO: violated KL constraint 0.010083992034196854 at step 1.
Expected Improvement: 0.04235093295574188 Actual: 0.03390660509467125
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.17749282717704773    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 83.0                    │
│ Train/Entropy                 │ 0.4249957501888275      │
│ Train/KL                      │ 0.00018937498680315912  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0005141496658325      │
│ Train/PolicyRatio/Min         │ 1.0005141496658325      │
│ Train/PolicyRatio/Max         │ 1.0005141496658325      │
│ Train/PolicyRatio/Std         │ 0.00030710676219314337  │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3701145648956299      │
│ TotalEnvSteps                 │ 168000.0                │
│ Loss/Loss_pi                  │ -0.028822939842939377   │
│ Loss/Loss_pi/Delta            │ 0.0010307040065526962   │
│ Value/Adv                     │ -4.261731945121028e-08  │
│ Loss/Loss_reward_critic       │ 0.0010912282159551978   │
│ Loss/Loss_reward_critic/Delta │ -9.982124902307987e-05  │
│ Value/reward                  │ -0.19187042117118835    │
│ Loss/Loss_cost_critic         │ 8.820644215035145e-10   │
│ Loss/Loss_cost_critic/Delta   │ -3.1439018055579027e-10 │
│ Value/cost                    │ 0.0005351672298274934   │
│ Time/Total                    │ 235.363037109375        │
│ Time/Rollout                  │ 1.6094813346862793      │
│ Time/Update                   │ 1.0322678089141846      │
│ Time/Epoch                    │ 2.64176344871521        │
│ Time/FPS                      │ 757.0702514648438       │
│ Misc/Alpha                    │ 0.47243809700012207     │
│ Misc/FinalStepNorm            │ 0.02041725069284439     │
│ Misc/gradient_norm            │ 9.627720832824707       │
│ Misc/xHx                      │ 0.0896066352725029      │
│ Misc/H_inv_g                  │ 0.0540209636092186      │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 6.782923264836427e-06   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.0896066352725029      │
│ Misc/r                        │ 2.2603940053045335e-08  │
│ Misc/s                        │ 1.6319380607065712e-14  │
│ Misc/Lambda_star              │ 2.1166794300079346      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2720  Steps count? : 0, Cost: 1698.0
For episode num 2721  Steps count? : 100, Cost: 1698.0
For episode num 2722  Steps count? : 100, Cost: 1698.0
For episode num 2723  Steps count? : 100, Cost: 1698.0
For episode num 2724  Steps count? : 100, Cost: 1698.0
For episode num 2725  Steps count? : 100, Cost: 1698.0
For episode num 2726  Steps count? : 100, Cost: 1698.0
For episode num 2727  Steps count? : 100, Cost: 1698.0
For episode num 2728  Steps count? : 100, Cost: 1698.0
For episode num 2729  Steps count? : 100, Cost: 1698.0
For episode num 2730  Steps count? : 100, Cost: 1698.0
For episode num 2731  Steps count? : 100, Cost: 1698.0
For episode num 2732  Steps count? : 100, Cost: 1698.0
For episode num 2733  Steps count? : 100, Cost: 1698.0
For episode num 2734  Steps count? : 100, Cost: 1698.0
For episode num 2735  Steps count? : 100, Cost: 1698.0
For episode num 2736  Steps count? : 100, Cost: 1698.0
For episode num 2737  Steps count? : 100, Cost: 1698.0
For episode num 2738  Steps count? : 100, Cost: 1698.0
For episode num 2739  Steps count? : 100, Cost: 1698.0
For episode num 2740  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 84... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.043390460312366486 Actual: 0.04270068556070328
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.17680177092552185    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 84.0                    │
│ Train/Entropy                 │ 0.4203841984272003      │
│ Train/KL                      │ 0.00029150856425985694  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9977079033851624      │
│ Train/PolicyRatio/Min         │ 0.9977079033851624      │
│ Train/PolicyRatio/Max         │ 0.9977079033851624      │
│ Train/PolicyRatio/Std         │ 0.0016207430744543672   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3684134781360626      │
│ TotalEnvSteps                 │ 170000.0                │
│ Loss/Loss_pi                  │ -0.03202551230788231    │
│ Loss/Loss_pi/Delta            │ -0.003202572464942932   │
│ Value/Adv                     │ -1.3828277189986693e-08 │
│ Loss/Loss_reward_critic       │ 0.0009936229325830936   │
│ Loss/Loss_reward_critic/Delta │ -9.760528337210417e-05  │
│ Value/reward                  │ -0.18848510086536407    │
│ Loss/Loss_cost_critic         │ 6.414249131836414e-10   │
│ Loss/Loss_cost_critic/Delta   │ -2.4063950831987313e-10 │
│ Value/cost                    │ 0.00045822947868146     │
│ Time/Total                    │ 238.00390625            │
│ Time/Rollout                  │ 1.6006569862365723      │
│ Time/Update                   │ 1.0226471424102783      │
│ Time/Epoch                    │ 2.6233181953430176      │
│ Time/FPS                      │ 762.3934936523438       │
│ Misc/Alpha                    │ 0.4608660936355591      │
│ Misc/FinalStepNorm            │ 0.029468804597854614    │
│ Misc/gradient_norm            │ 9.911520004272461       │
│ Misc/xHx                      │ 0.09416303038597107     │
│ Misc/H_inv_g                  │ 0.06394223123788834     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 1.059015266946517e-05   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.09416303038597107     │
│ Misc/r                        │ -5.338013409073028e-08  │
│ Misc/s                        │ 5.766533897524212e-14   │
│ Misc/Lambda_star              │ 2.169827699661255       │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2741  Steps count? : 0, Cost: 1698.0
For episode num 2742  Steps count? : 100, Cost: 1698.0
For episode num 2743  Steps count? : 100, Cost: 1698.0
For episode num 2744  Steps count? : 100, Cost: 1698.0
For episode num 2745  Steps count? : 100, Cost: 1698.0
For episode num 2746  Steps count? : 100, Cost: 1698.0
For episode num 2747  Steps count? : 100, Cost: 1698.0
For episode num 2748  Steps count? : 100, Cost: 1698.0
For episode num 2749  Steps count? : 100, Cost: 1698.0
For episode num 2750  Steps count? : 100, Cost: 1698.0
For episode num 2751  Steps count? : 100, Cost: 1698.0
For episode num 2752  Steps count? : 100, Cost: 1698.0
For episode num 2753  Steps count? : 100, Cost: 1698.0
For episode num 2754  Steps count? : 100, Cost: 1698.0
For episode num 2755  Steps count? : 100, Cost: 1698.0
For episode num 2756  Steps count? : 100, Cost: 1698.0
For episode num 2757  Steps count? : 100, Cost: 1698.0
For episode num 2758  Steps count? : 100, Cost: 1698.0
For episode num 2759  Steps count? : 100, Cost: 1698.0
For episode num 2760  Steps count? : 100, Cost: 1698.0
For episode num 2761  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 85... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.040012191981077194 Actual: 0.03943832963705063
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.17532506585121155   │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 85.0                   │
│ Train/Entropy                 │ 0.4206507205963135     │
│ Train/KL                      │ 0.00027903742738999426 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.0001214742660522     │
│ Train/PolicyRatio/Min         │ 1.0001214742660522     │
│ Train/PolicyRatio/Max         │ 1.0001214742660522     │
│ Train/PolicyRatio/Std         │ 8.59233841765672e-05   │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.3685103952884674     │
│ TotalEnvSteps                 │ 172000.0               │
│ Loss/Loss_pi                  │ -0.029578737914562225  │
│ Loss/Loss_pi/Delta            │ 0.0024467743933200836  │
│ Value/Adv                     │ 3.767013723177115e-08  │
│ Loss/Loss_reward_critic       │ 0.0009032198577187955  │
│ Loss/Loss_reward_critic/Delta │ -9.04030748642981e-05  │
│ Value/reward                  │ -0.18679440021514893   │
│ Loss/Loss_cost_critic         │ 4.807744757862054e-10  │
│ Loss/Loss_cost_critic/Delta   │ -1.60650437397436e-10  │
│ Value/cost                    │ 0.0003946778306271881  │
│ Time/Total                    │ 240.6420135498047      │
│ Time/Rollout                  │ 1.6066412925720215     │
│ Time/Update                   │ 1.0135464668273926     │
│ Time/Epoch                    │ 2.620201826095581      │
│ Time/FPS                      │ 763.3002319335938      │
│ Misc/Alpha                    │ 0.5003771185874939     │
│ Misc/FinalStepNorm            │ 0.039483290165662766   │
│ Misc/gradient_norm            │ 9.124670028686523      │
│ Misc/xHx                      │ 0.07987945526838303    │
│ Misc/H_inv_g                  │ 0.0789070725440979     │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 3.140216722385958e-05  │
│ Misc/A                        │ 0.0                    │
│ Misc/B                        │ 0.0                    │
│ Misc/q                        │ 0.07987945526838303    │
│ Misc/r                        │ 2.2524861265083018e-07 │
│ Misc/s                        │ 7.76543868270807e-13   │
│ Misc/Lambda_star              │ 1.99849271774292       │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 4.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 2762  Steps count? : 0, Cost: 1698.0
For episode num 2763  Steps count? : 100, Cost: 1698.0
For episode num 2764  Steps count? : 100, Cost: 1698.0
For episode num 2765  Steps count? : 100, Cost: 1698.0
For episode num 2766  Steps count? : 100, Cost: 1698.0
For episode num 2767  Steps count? : 100, Cost: 1698.0
For episode num 2768  Steps count? : 100, Cost: 1698.0
For episode num 2769  Steps count? : 100, Cost: 1698.0
For episode num 2770  Steps count? : 100, Cost: 1698.0
For episode num 2771  Steps count? : 100, Cost: 1698.0
For episode num 2772  Steps count? : 100, Cost: 1698.0
For episode num 2773  Steps count? : 100, Cost: 1698.0
For episode num 2774  Steps count? : 100, Cost: 1698.0
For episode num 2775  Steps count? : 100, Cost: 1698.0
For episode num 2776  Steps count? : 100, Cost: 1698.0
For episode num 2777  Steps count? : 100, Cost: 1698.0
For episode num 2778  Steps count? : 100, Cost: 1698.0
For episode num 2779  Steps count? : 100, Cost: 1698.0
For episode num 2780  Steps count? : 100, Cost: 1698.0
For episode num 2781  Steps count? : 100, Cost: 1698.0
For episode num 2782  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 86... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.04215142875909805 Actual: 0.04138088598847389
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.17278622090816498    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 86.0                    │
│ Train/Entropy                 │ 0.41767704486846924     │
│ Train/KL                      │ 0.0002908782917074859   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9995797276496887      │
│ Train/PolicyRatio/Min         │ 0.9995797276496887      │
│ Train/PolicyRatio/Max         │ 0.9995797276496887      │
│ Train/PolicyRatio/Std         │ 0.00029713529511354864  │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3674173355102539      │
│ TotalEnvSteps                 │ 174000.0                │
│ Loss/Loss_pi                  │ -0.0310356467962265     │
│ Loss/Loss_pi/Delta            │ -0.0014569088816642761  │
│ Value/Adv                     │ -2.5272369086337676e-08 │
│ Loss/Loss_reward_critic       │ 0.0008243051706813276   │
│ Loss/Loss_reward_critic/Delta │ -7.891468703746796e-05  │
│ Value/reward                  │ -0.18373747169971466    │
│ Loss/Loss_cost_critic         │ 3.499933676209821e-10   │
│ Loss/Loss_cost_critic/Delta   │ -1.3078110816522326e-10 │
│ Value/cost                    │ 0.0003366796299815178   │
│ Time/Total                    │ 243.26177978515625      │
│ Time/Rollout                  │ 1.5997605323791504      │
│ Time/Update                   │ 1.002303123474121       │
│ Time/Epoch                    │ 2.602079391479492       │
│ Time/FPS                      │ 768.6162719726562       │
│ Misc/Alpha                    │ 0.47495847940444946     │
│ Misc/FinalStepNorm            │ 0.034915804862976074    │
│ Misc/gradient_norm            │ 9.444169044494629       │
│ Misc/xHx                      │ 0.0886581540107727      │
│ Misc/H_inv_g                  │ 0.07351338863372803     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 1.5021824992800248e-06  │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.0886581540107727      │
│ Misc/r                        │ 4.4110559649368497e-10  │
│ Misc/s                        │ 7.293543290200154e-17   │
│ Misc/Lambda_star              │ 2.105447292327881       │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2783  Steps count? : 0, Cost: 1698.0
For episode num 2784  Steps count? : 100, Cost: 1698.0
For episode num 2785  Steps count? : 100, Cost: 1698.0
For episode num 2786  Steps count? : 100, Cost: 1698.0
For episode num 2787  Steps count? : 100, Cost: 1698.0
For episode num 2788  Steps count? : 100, Cost: 1698.0
For episode num 2789  Steps count? : 100, Cost: 1698.0
For episode num 2790  Steps count? : 100, Cost: 1698.0
For episode num 2791  Steps count? : 100, Cost: 1698.0
For episode num 2792  Steps count? : 100, Cost: 1698.0
For episode num 2793  Steps count? : 100, Cost: 1698.0
For episode num 2794  Steps count? : 100, Cost: 1698.0
For episode num 2795  Steps count? : 100, Cost: 1698.0
For episode num 2796  Steps count? : 100, Cost: 1698.0
For episode num 2797  Steps count? : 100, Cost: 1698.0
For episode num 2798  Steps count? : 100, Cost: 1698.0
For episode num 2799  Steps count? : 100, Cost: 1698.0
For episode num 2800  Steps count? : 100, Cost: 1698.0
For episode num 2801  Steps count? : 100, Cost: 1698.0
For episode num 2802  Steps count? : 100, Cost: 1698.0
For episode num 2803  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 87... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.038774579763412476 Actual: 0.037565164268016815
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.16456876695156097   │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 87.0                   │
│ Train/Entropy                 │ 0.40861964225769043    │
│ Train/KL                      │ 0.00028737628599628806 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.0003708600997925     │
│ Train/PolicyRatio/Min         │ 1.0003708600997925     │
│ Train/PolicyRatio/Max         │ 1.0003708600997925     │
│ Train/PolicyRatio/Std         │ 0.00026220959261991084 │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.36410728096961975    │
│ TotalEnvSteps                 │ 176000.0               │
│ Loss/Loss_pi                  │ -0.028173930943012238  │
│ Loss/Loss_pi/Delta            │ 0.002861715853214264   │
│ Value/Adv                     │ 5.197524899358541e-08  │
│ Loss/Loss_reward_critic       │ 0.0007325956830754876  │
│ Loss/Loss_reward_critic/Delta │ -9.170948760583997e-05 │
│ Value/reward                  │ -0.179974764585495     │
│ Loss/Loss_cost_critic         │ 2.5433591344103945e-10 │
│ Loss/Loss_cost_critic/Delta   │ -9.565745417994265e-11 │
│ Value/cost                    │ 0.0002890937321353704  │
│ Time/Total                    │ 245.86936950683594     │
│ Time/Rollout                  │ 1.6003286838531494     │
│ Time/Update                   │ 0.9895925521850586     │
│ Time/Epoch                    │ 2.5899388790130615     │
│ Time/FPS                      │ 772.2192993164062      │
│ Misc/Alpha                    │ 0.5160397887229919     │
│ Misc/FinalStepNorm            │ 0.06529867649078369    │
│ Misc/gradient_norm            │ 8.767090797424316      │
│ Misc/xHx                      │ 0.07510408014059067    │
│ Misc/H_inv_g                  │ 0.12653803825378418    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 1.3347194908419624e-05 │
│ Misc/A                        │ 0.0                    │
│ Misc/B                        │ 0.0                    │
│ Misc/q                        │ 0.07510408014059067    │
│ Misc/r                        │ -6.781142047884714e-08 │
│ Misc/s                        │ 1.0330640151916953e-13 │
│ Misc/Lambda_star              │ 1.9378350973129272     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 4.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 2804  Steps count? : 0, Cost: 1698.0
For episode num 2805  Steps count? : 100, Cost: 1698.0
For episode num 2806  Steps count? : 100, Cost: 1698.0
For episode num 2807  Steps count? : 100, Cost: 1698.0
For episode num 2808  Steps count? : 100, Cost: 1698.0
For episode num 2809  Steps count? : 100, Cost: 1698.0
For episode num 2810  Steps count? : 100, Cost: 1698.0
For episode num 2811  Steps count? : 100, Cost: 1698.0
For episode num 2812  Steps count? : 100, Cost: 1698.0
For episode num 2813  Steps count? : 100, Cost: 1698.0
For episode num 2814  Steps count? : 100, Cost: 1698.0
For episode num 2815  Steps count? : 100, Cost: 1698.0
For episode num 2816  Steps count? : 100, Cost: 1698.0
For episode num 2817  Steps count? : 100, Cost: 1698.0
For episode num 2818  Steps count? : 100, Cost: 1698.0
For episode num 2819  Steps count? : 100, Cost: 1698.0
For episode num 2820  Steps count? : 100, Cost: 1698.0
For episode num 2821  Steps count? : 100, Cost: 1698.0
For episode num 2822  Steps count? : 100, Cost: 1698.0
For episode num 2823  Steps count? : 100, Cost: 1698.0
For episode num 2824  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 88... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.04068293794989586 Actual: 0.039656344801187515
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.15740607678890228   │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 88.0                   │
│ Train/Entropy                 │ 0.3992570638656616     │
│ Train/KL                      │ 0.0002870057651307434  │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.0008935928344727     │
│ Train/PolicyRatio/Min         │ 1.0008935928344727     │
│ Train/PolicyRatio/Max         │ 1.0008935928344727     │
│ Train/PolicyRatio/Std         │ 0.0006318093510344625  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.3607129752635956     │
│ TotalEnvSteps                 │ 178000.0               │
│ Loss/Loss_pi                  │ -0.029742198064923286  │
│ Loss/Loss_pi/Delta            │ -0.0015682671219110489 │
│ Value/Adv                     │ -7.450580596923828e-08 │
│ Loss/Loss_reward_critic       │ 0.0006578181637451053  │
│ Loss/Loss_reward_critic/Delta │ -7.477751933038235e-05 │
│ Value/reward                  │ -0.17538656294345856   │
│ Loss/Loss_cost_critic         │ 1.8755892672306373e-10 │
│ Loss/Loss_cost_critic/Delta   │ -6.677698671797572e-11 │
│ Value/cost                    │ 0.0002480808470863849  │
│ Time/Total                    │ 248.45559692382812     │
│ Time/Rollout                  │ 1.5949935913085938     │
│ Time/Update                   │ 0.9734690189361572     │
│ Time/Epoch                    │ 2.5684854984283447     │
│ Time/FPS                      │ 778.6715087890625      │
│ Misc/Alpha                    │ 0.49175015091896057    │
│ Misc/FinalStepNorm            │ 0.026892777532339096   │
│ Misc/gradient_norm            │ 9.352519989013672      │
│ Misc/xHx                      │ 0.0827067494392395     │
│ Misc/H_inv_g                  │ 0.05468789115548134    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 4.060923856741283e-06  │
│ Misc/A                        │ 0.0                    │
│ Misc/B                        │ 0.0                    │
│ Misc/q                        │ 0.0827067494392395     │
│ Misc/r                        │ 6.81161882454262e-09   │
│ Misc/s                        │ 2.961712109963337e-15  │
│ Misc/Lambda_star              │ 2.033553123474121      │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 4.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 2825  Steps count? : 0, Cost: 1698.0
For episode num 2826  Steps count? : 100, Cost: 1698.0
For episode num 2827  Steps count? : 100, Cost: 1698.0
For episode num 2828  Steps count? : 100, Cost: 1698.0
For episode num 2829  Steps count? : 100, Cost: 1698.0
For episode num 2830  Steps count? : 100, Cost: 1698.0
For episode num 2831  Steps count? : 100, Cost: 1698.0
For episode num 2832  Steps count? : 100, Cost: 1698.0
For episode num 2833  Steps count? : 100, Cost: 1698.0
For episode num 2834  Steps count? : 100, Cost: 1698.0
For episode num 2835  Steps count? : 100, Cost: 1698.0
For episode num 2836  Steps count? : 100, Cost: 1698.0
For episode num 2837  Steps count? : 100, Cost: 1698.0
For episode num 2838  Steps count? : 100, Cost: 1698.0
For episode num 2839  Steps count? : 100, Cost: 1698.0
For episode num 2840  Steps count? : 100, Cost: 1698.0
For episode num 2841  Steps count? : 100, Cost: 1698.0
For episode num 2842  Steps count? : 100, Cost: 1698.0
For episode num 2843  Steps count? : 100, Cost: 1698.0
For episode num 2844  Steps count? : 100, Cost: 1698.0
For episode num 2845  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 89... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.04430904611945152 Actual: 0.04157372564077377
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.15278133749961853   │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 89.0                   │
│ Train/Entropy                 │ 0.3919931948184967     │
│ Train/KL                      │ 0.00026251072995364666 │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 0.996431827545166      │
│ Train/PolicyRatio/Min         │ 0.996431827545166      │
│ Train/PolicyRatio/Max         │ 0.996431827545166      │
│ Train/PolicyRatio/Std         │ 0.002523107221350074   │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.35810086131095886    │
│ TotalEnvSteps                 │ 180000.0               │
│ Loss/Loss_pi                  │ -0.03118034452199936   │
│ Loss/Loss_pi/Delta            │ -0.0014381464570760727 │
│ Value/Adv                     │ 4.327297276063291e-08  │
│ Loss/Loss_reward_critic       │ 0.0005876373616047204  │
│ Loss/Loss_reward_critic/Delta │ -7.018080214038491e-05 │
│ Value/reward                  │ -0.17047229409217834   │
│ Loss/Loss_cost_critic         │ 1.4184382834958598e-10 │
│ Loss/Loss_cost_critic/Delta   │ -4.571509837347776e-11 │
│ Value/cost                    │ 0.00021284697868395597 │
│ Time/Total                    │ 251.03321838378906     │
│ Time/Rollout                  │ 1.599078893661499      │
│ Time/Update                   │ 0.9593148231506348     │
│ Time/Epoch                    │ 2.5584099292755127     │
│ Time/FPS                      │ 781.7357788085938      │
│ Misc/Alpha                    │ 0.4521273076534271     │
│ Misc/FinalStepNorm            │ 0.06338980793952942    │
│ Misc/gradient_norm            │ 9.948541641235352      │
│ Misc/xHx                      │ 0.09783820062875748    │
│ Misc/H_inv_g                  │ 0.14020344614982605    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 2.9600732887047343e-06 │
│ Misc/A                        │ 0.0                    │
│ Misc/B                        │ 0.0                    │
│ Misc/q                        │ 0.09783820062875748    │
│ Misc/r                        │ -3.089083655538616e-09 │
│ Misc/s                        │ 9.740596196178308e-16  │
│ Misc/Lambda_star              │ 2.211766481399536      │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 4.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 2846  Steps count? : 0, Cost: 1698.0
For episode num 2847  Steps count? : 100, Cost: 1698.0
For episode num 2848  Steps count? : 100, Cost: 1698.0
For episode num 2849  Steps count? : 100, Cost: 1698.0
For episode num 2850  Steps count? : 100, Cost: 1698.0
For episode num 2851  Steps count? : 100, Cost: 1698.0
For episode num 2852  Steps count? : 100, Cost: 1698.0
For episode num 2853  Steps count? : 100, Cost: 1698.0
For episode num 2854  Steps count? : 100, Cost: 1698.0
For episode num 2855  Steps count? : 100, Cost: 1698.0
For episode num 2856  Steps count? : 100, Cost: 1698.0
For episode num 2857  Steps count? : 100, Cost: 1698.0
For episode num 2858  Steps count? : 100, Cost: 1698.0
For episode num 2859  Steps count? : 100, Cost: 1698.0
For episode num 2860  Steps count? : 100, Cost: 1698.0
For episode num 2861  Steps count? : 100, Cost: 1698.0
For episode num 2862  Steps count? : 100, Cost: 1698.0
For episode num 2863  Steps count? : 100, Cost: 1698.0
For episode num 2864  Steps count? : 100, Cost: 1698.0
For episode num 2865  Steps count? : 100, Cost: 1698.0
For episode num 2866  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 90... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.037077367305755615 Actual: 0.03666312247514725
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.1466027796268463    │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 90.0                   │
│ Train/Entropy                 │ 0.38663995265960693    │
│ Train/KL                      │ 0.0002799658686853945  │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 1.0002797842025757     │
│ Train/PolicyRatio/Min         │ 1.0002797842025757     │
│ Train/PolicyRatio/Max         │ 1.0002797842025757     │
│ Train/PolicyRatio/Std         │ 0.00019775304826907814 │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.35618820786476135    │
│ TotalEnvSteps                 │ 182000.0               │
│ Loss/Loss_pi                  │ -0.027497410774230957  │
│ Loss/Loss_pi/Delta            │ 0.003682933747768402   │
│ Value/Adv                     │ 6.055832102447312e-08  │
│ Loss/Loss_reward_critic       │ 0.0005168372881598771  │
│ Loss/Loss_reward_critic/Delta │ -7.080007344484329e-05 │
│ Value/reward                  │ -0.16664178669452667   │
│ Loss/Loss_cost_critic         │ 1.0271517664195429e-10 │
│ Loss/Loss_cost_critic/Delta   │ -3.912865170763169e-11 │
│ Value/cost                    │ 0.00018318141519557685 │
│ Time/Total                    │ 253.6377410888672      │
│ Time/Rollout                  │ 1.5888431072235107     │
│ Time/Update                   │ 0.9977197647094727     │
│ Time/Epoch                    │ 2.5865888595581055     │
│ Time/FPS                      │ 773.2194213867188      │
│ Misc/Alpha                    │ 0.5395746231079102     │
│ Misc/FinalStepNorm            │ 0.0933511033654213     │
│ Misc/gradient_norm            │ 8.507683753967285      │
│ Misc/xHx                      │ 0.06869527697563171    │
│ Misc/H_inv_g                  │ 0.17300869524478912    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 4.1098707015407854e-07 │
│ Misc/A                        │ 0.0                    │
│ Misc/B                        │ 0.0                    │
│ Misc/q                        │ 0.06869527697563171    │
│ Misc/r                        │ 1.668182464994311e-13  │
│ Misc/s                        │ 4.2795509894471557e-19 │
│ Misc/Lambda_star              │ 1.8533117771148682     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 4.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 2867  Steps count? : 0, Cost: 1698.0
For episode num 2868  Steps count? : 100, Cost: 1698.0
For episode num 2869  Steps count? : 100, Cost: 1698.0
For episode num 2870  Steps count? : 100, Cost: 1698.0
For episode num 2871  Steps count? : 100, Cost: 1698.0
For episode num 2872  Steps count? : 100, Cost: 1698.0
For episode num 2873  Steps count? : 100, Cost: 1698.0
For episode num 2874  Steps count? : 100, Cost: 1698.0
For episode num 2875  Steps count? : 100, Cost: 1698.0
For episode num 2876  Steps count? : 100, Cost: 1698.0
For episode num 2877  Steps count? : 100, Cost: 1698.0
For episode num 2878  Steps count? : 100, Cost: 1698.0
For episode num 2879  Steps count? : 100, Cost: 1698.0
For episode num 2880  Steps count? : 100, Cost: 1698.0
For episode num 2881  Steps count? : 100, Cost: 1698.0
For episode num 2882  Steps count? : 100, Cost: 1698.0
For episode num 2883  Steps count? : 100, Cost: 1698.0
For episode num 2884  Steps count? : 100, Cost: 1698.0
For episode num 2885  Steps count? : 100, Cost: 1698.0
For episode num 2886  Steps count? : 100, Cost: 1698.0
For episode num 2887  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 91... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.039809659123420715 Actual: 0.03911522403359413
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.1372908651828766     │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 91.0                    │
│ Train/Entropy                 │ 0.3897925913333893      │
│ Train/KL                      │ 0.00027257041074335575  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0025051832199097      │
│ Train/PolicyRatio/Min         │ 1.0025051832199097      │
│ Train/PolicyRatio/Max         │ 1.0025051832199097      │
│ Train/PolicyRatio/Std         │ 0.001771403942257166    │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.35731399059295654     │
│ TotalEnvSteps                 │ 184000.0                │
│ Loss/Loss_pi                  │ -0.029336392879486084   │
│ Loss/Loss_pi/Delta            │ -0.001838982105255127   │
│ Value/Adv                     │ -1.0013580187262505e-08 │
│ Loss/Loss_reward_critic       │ 0.00046121914056129754  │
│ Loss/Loss_reward_critic/Delta │ -5.5618147598579526e-05 │
│ Value/reward                  │ -0.16187021136283875    │
│ Loss/Loss_cost_critic         │ 7.50290801709852e-11    │
│ Loss/Loss_cost_critic/Delta   │ -2.768609647096909e-11  │
│ Value/cost                    │ 0.00015770524623803794  │
│ Time/Total                    │ 257.2065734863281       │
│ Time/Rollout                  │ 2.2928051948547363      │
│ Time/Update                   │ 1.2525722980499268      │
│ Time/Epoch                    │ 3.545396327972412       │
│ Time/FPS                      │ 564.1119995117188       │
│ Misc/Alpha                    │ 0.5022489428520203      │
│ Misc/FinalStepNorm            │ 0.0372382290661335      │
│ Misc/gradient_norm            │ 9.452851295471191       │
│ Misc/xHx                      │ 0.0792851597070694      │
│ Misc/H_inv_g                  │ 0.07414296269416809     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 6.908549494255567e-06   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.0792851597070694      │
│ Misc/r                        │ 2.191387693528668e-08   │
│ Misc/s                        │ 1.6017521781586802e-14  │
│ Misc/Lambda_star              │ 1.991044521331787       │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2888  Steps count? : 0, Cost: 1698.0
For episode num 2889  Steps count? : 100, Cost: 1698.0
For episode num 2890  Steps count? : 100, Cost: 1698.0
For episode num 2891  Steps count? : 100, Cost: 1698.0
For episode num 2892  Steps count? : 100, Cost: 1698.0
For episode num 2893  Steps count? : 100, Cost: 1698.0
For episode num 2894  Steps count? : 100, Cost: 1698.0
For episode num 2895  Steps count? : 100, Cost: 1698.0
For episode num 2896  Steps count? : 100, Cost: 1698.0
For episode num 2897  Steps count? : 100, Cost: 1698.0
For episode num 2898  Steps count? : 100, Cost: 1698.0
For episode num 2899  Steps count? : 100, Cost: 1698.0
For episode num 2900  Steps count? : 100, Cost: 1698.0
For episode num 2901  Steps count? : 100, Cost: 1698.0
For episode num 2902  Steps count? : 100, Cost: 1698.0
For episode num 2903  Steps count? : 100, Cost: 1698.0
For episode num 2904  Steps count? : 100, Cost: 1698.0
For episode num 2905  Steps count? : 100, Cost: 1698.0
For episode num 2906  Steps count? : 100, Cost: 1698.0
For episode num 2907  Steps count? : 100, Cost: 1698.0
For episode num 2908  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 92... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Expected Improvement: 0.047238245606422424 Actual: 0.04208679497241974
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.13593445718288422    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 92.0                    │
│ Train/Entropy                 │ 0.3902643620967865      │
│ Train/KL                      │ 0.00023306775256060064  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.996729850769043       │
│ Train/PolicyRatio/Min         │ 0.996729850769043       │
│ Train/PolicyRatio/Max         │ 0.996729850769043       │
│ Train/PolicyRatio/Std         │ 0.002312316559255123    │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.35748091340065        │
│ TotalEnvSteps                 │ 186000.0                │
│ Loss/Loss_pi                  │ -0.03156512603163719    │
│ Loss/Loss_pi/Delta            │ -0.002228733152151108   │
│ Value/Adv                     │ -3.0994415922691587e-09 │
│ Loss/Loss_reward_critic       │ 0.0004011077689938247   │
│ Loss/Loss_reward_critic/Delta │ -6.0111371567472816e-05 │
│ Value/reward                  │ -0.15641677379608154    │
│ Loss/Loss_cost_critic         │ 5.570838929047639e-11   │
│ Loss/Loss_cost_critic/Delta   │ -1.9320690880508806e-11 │
│ Value/cost                    │ 0.00013581360690295696  │
│ Time/Total                    │ 260.4153747558594       │
│ Time/Rollout                  │ 2.2472498416900635      │
│ Time/Update                   │ 0.9393410682678223      │
│ Time/Epoch                    │ 3.1866073608398438      │
│ Time/FPS                      │ 627.6268920898438       │
│ Misc/Alpha                    │ 0.4234920144081116      │
│ Misc/FinalStepNorm            │ 0.07100996375083923     │
│ Misc/gradient_norm            │ 10.729105949401855      │
│ Misc/xHx                      │ 0.11151659488677979     │
│ Misc/H_inv_g                  │ 0.16767723858356476     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 4.636537482838321e-07   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.11151659488677979     │
│ Misc/r                        │ -1.5493582111725246e-11 │
│ Misc/s                        │ 6.897404178742794e-19   │
│ Misc/Lambda_star              │ 2.3613195419311523      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2909  Steps count? : 0, Cost: 1698.0
For episode num 2910  Steps count? : 100, Cost: 1698.0
For episode num 2911  Steps count? : 100, Cost: 1698.0
For episode num 2912  Steps count? : 100, Cost: 1698.0
For episode num 2913  Steps count? : 100, Cost: 1698.0
For episode num 2914  Steps count? : 100, Cost: 1698.0
For episode num 2915  Steps count? : 100, Cost: 1698.0
For episode num 2916  Steps count? : 100, Cost: 1698.0
For episode num 2917  Steps count? : 100, Cost: 1698.0
For episode num 2918  Steps count? : 100, Cost: 1698.0
For episode num 2919  Steps count? : 100, Cost: 1698.0
For episode num 2920  Steps count? : 100, Cost: 1698.0
For episode num 2921  Steps count? : 100, Cost: 1698.0
For episode num 2922  Steps count? : 100, Cost: 1698.0
For episode num 2923  Steps count? : 100, Cost: 1698.0
For episode num 2924  Steps count? : 100, Cost: 1698.0
For episode num 2925  Steps count? : 100, Cost: 1698.0
For episode num 2926  Steps count? : 100, Cost: 1698.0
For episode num 2927  Steps count? : 100, Cost: 1698.0
For episode num 2928  Steps count? : 100, Cost: 1698.0
For episode num 2929  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 93... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.0363970547914505 Actual: 0.03578400984406471
INFO: violated KL constraint 0.010360313579440117 at step 1.
Expected Improvement: 0.0363970547914505 Actual: 0.02871657907962799
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.134613499045372      │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 93.0                    │
│ Train/Entropy                 │ 0.3775911331176758      │
│ Train/KL                      │ 0.00019331711519043893  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9993855953216553      │
│ Train/PolicyRatio/Min         │ 0.9993855953216553      │
│ Train/PolicyRatio/Max         │ 0.9993855953216553      │
│ Train/PolicyRatio/Std         │ 0.0003699071239680052   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.35298725962638855     │
│ TotalEnvSteps                 │ 188000.0                │
│ Loss/Loss_pi                  │ -0.024386748671531677   │
│ Loss/Loss_pi/Delta            │ 0.0071783773601055145   │
│ Value/Adv                     │ 1.835823049134433e-08   │
│ Loss/Loss_reward_critic       │ 0.000347561901435256    │
│ Loss/Loss_reward_critic/Delta │ -5.3545867558568716e-05 │
│ Value/reward                  │ -0.15196488797664642    │
│ Loss/Loss_cost_critic         │ 4.6463655839490414e-11  │
│ Loss/Loss_cost_critic/Delta   │ -9.244733450985976e-12  │
│ Value/cost                    │ 0.00011712498962879181  │
│ Time/Total                    │ 262.9463195800781       │
│ Time/Rollout                  │ 1.5688965320587158      │
│ Time/Update                   │ 0.9441919326782227      │
│ Time/Epoch                    │ 2.5131025314331055      │
│ Time/FPS                      │ 795.8294067382812       │
│ Misc/Alpha                    │ 0.550990879535675       │
│ Misc/FinalStepNorm            │ 0.0328843779861927      │
│ Misc/gradient_norm            │ 8.578011512756348       │
│ Misc/xHx                      │ 0.06587810814380646     │
│ Misc/H_inv_g                  │ 0.07460280507802963     │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 4.762168373417808e-06   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.06587810814380646     │
│ Misc/r                        │ 8.83606254831193e-09    │
│ Misc/s                        │ 4.916842029145186e-15   │
│ Misc/Lambda_star              │ 1.8149120807647705      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2930  Steps count? : 0, Cost: 1698.0
For episode num 2931  Steps count? : 100, Cost: 1698.0
For episode num 2932  Steps count? : 100, Cost: 1698.0
For episode num 2933  Steps count? : 100, Cost: 1698.0
For episode num 2934  Steps count? : 100, Cost: 1698.0
For episode num 2935  Steps count? : 100, Cost: 1698.0
For episode num 2936  Steps count? : 100, Cost: 1698.0
For episode num 2937  Steps count? : 100, Cost: 1698.0
For episode num 2938  Steps count? : 100, Cost: 1698.0
For episode num 2939  Steps count? : 100, Cost: 1698.0
For episode num 2940  Steps count? : 100, Cost: 1698.0
For episode num 2941  Steps count? : 100, Cost: 1698.0
For episode num 2942  Steps count? : 100, Cost: 1698.0
For episode num 2943  Steps count? : 100, Cost: 1698.0
For episode num 2944  Steps count? : 100, Cost: 1698.0
For episode num 2945  Steps count? : 100, Cost: 1698.0
For episode num 2946  Steps count? : 100, Cost: 1698.0
For episode num 2947  Steps count? : 100, Cost: 1698.0
For episode num 2948  Steps count? : 100, Cost: 1698.0
For episode num 2949  Steps count? : 100, Cost: 1698.0
For episode num 2950  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 94... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.04177037626504898 Actual: 0.0411866158246994
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.12889982759952545    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 94.0                    │
│ Train/Entropy                 │ 0.37007907032966614     │
│ Train/KL                      │ 0.00029147398890927434  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9969646334648132      │
│ Train/PolicyRatio/Min         │ 0.9969646334648132      │
│ Train/PolicyRatio/Max         │ 0.9969646334648132      │
│ Train/PolicyRatio/Std         │ 0.0021463704761117697   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3503390848636627      │
│ TotalEnvSteps                 │ 190000.0                │
│ Loss/Loss_pi                  │ -0.030889954417943954   │
│ Loss/Loss_pi/Delta            │ -0.006503205746412277   │
│ Value/Adv                     │ -4.76837147544984e-09   │
│ Loss/Loss_reward_critic       │ 0.0003005571779794991   │
│ Loss/Loss_reward_critic/Delta │ -4.70047234557569e-05   │
│ Value/reward                  │ -0.1470276266336441     │
│ Loss/Loss_cost_critic         │ 3.322225827773195e-11   │
│ Loss/Loss_cost_critic/Delta   │ -1.3241397561758461e-11 │
│ Value/cost                    │ 0.00010078550258185714  │
│ Time/Total                    │ 265.4634094238281       │
│ Time/Rollout                  │ 1.5728833675384521      │
│ Time/Update                   │ 0.925830602645874       │
│ Time/Epoch                    │ 2.49873423576355        │
│ Time/FPS                      │ 800.4054565429688       │
│ Misc/Alpha                    │ 0.4797220826148987      │
│ Misc/FinalStepNorm            │ 0.04570009186863899     │
│ Misc/gradient_norm            │ 10.247857093811035      │
│ Misc/xHx                      │ 0.08690614998340607     │
│ Misc/H_inv_g                  │ 0.09526366740465164     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 1.358287590846885e-06   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.08690614998340607     │
│ Misc/r                        │ -3.644457513551913e-10  │
│ Misc/s                        │ 4.863121214396848e-17   │
│ Misc/Lambda_star              │ 2.084540367126465       │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2951  Steps count? : 0, Cost: 1698.0
For episode num 2952  Steps count? : 100, Cost: 1698.0
For episode num 2953  Steps count? : 100, Cost: 1698.0
For episode num 2954  Steps count? : 100, Cost: 1698.0
For episode num 2955  Steps count? : 100, Cost: 1698.0
For episode num 2956  Steps count? : 100, Cost: 1698.0
For episode num 2957  Steps count? : 100, Cost: 1698.0
For episode num 2958  Steps count? : 100, Cost: 1698.0
For episode num 2959  Steps count? : 100, Cost: 1698.0
For episode num 2960  Steps count? : 100, Cost: 1698.0
For episode num 2961  Steps count? : 100, Cost: 1698.0
For episode num 2962  Steps count? : 100, Cost: 1698.0
For episode num 2963  Steps count? : 100, Cost: 1698.0
For episode num 2964  Steps count? : 100, Cost: 1698.0
For episode num 2965  Steps count? : 100, Cost: 1698.0
For episode num 2966  Steps count? : 100, Cost: 1698.0
For episode num 2967  Steps count? : 100, Cost: 1698.0
For episode num 2968  Steps count? : 100, Cost: 1698.0
For episode num 2969  Steps count? : 100, Cost: 1698.0
For episode num 2970  Steps count? : 100, Cost: 1698.0
For episode num 2971  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 95... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.041597723960876465 Actual: 0.04174615070223808
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.12338706105947495    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 95.0                    │
│ Train/Entropy                 │ 0.3641226291656494      │
│ Train/KL                      │ 0.0002849605807568878   │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0011266469955444      │
│ Train/PolicyRatio/Min         │ 1.0011266469955444      │
│ Train/PolicyRatio/Max         │ 1.0011266469955444      │
│ Train/PolicyRatio/Std         │ 0.000796687847469002    │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3482576310634613      │
│ TotalEnvSteps                 │ 192000.0                │
│ Loss/Loss_pi                  │ -0.0313095785677433     │
│ Loss/Loss_pi/Delta            │ -0.0004196241497993469  │
│ Value/Adv                     │ -3.6358834165639564e-08 │
│ Loss/Loss_reward_critic       │ 0.00025954630109481514  │
│ Loss/Loss_reward_critic/Delta │ -4.101087688468397e-05  │
│ Value/reward                  │ -0.14318904280662537    │
│ Loss/Loss_cost_critic         │ 2.3502078755344158e-11  │
│ Loss/Loss_cost_critic/Delta   │ -9.720179522387795e-12  │
│ Value/cost                    │ 8.642931061331183e-05   │
│ Time/Total                    │ 267.96624755859375      │
│ Time/Rollout                  │ 1.5697779655456543      │
│ Time/Update                   │ 0.9151785373687744      │
│ Time/Epoch                    │ 2.4849722385406494      │
│ Time/FPS                      │ 804.8381958007812       │
│ Misc/Alpha                    │ 0.48102861642837524     │
│ Misc/FinalStepNorm            │ 0.07611427456140518     │
│ Misc/gradient_norm            │ 10.105287551879883      │
│ Misc/xHx                      │ 0.08643469214439392     │
│ Misc/H_inv_g                  │ 0.15823233127593994     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 1.56063526901562e-06    │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.08643469214439392     │
│ Misc/r                        │ 5.383850587570294e-10   │
│ Misc/s                        │ 8.32130990734137e-17    │
│ Misc/Lambda_star              │ 2.078878402709961       │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 2972  Steps count? : 0, Cost: 1698.0
For episode num 2973  Steps count? : 100, Cost: 1698.0
For episode num 2974  Steps count? : 100, Cost: 1698.0
For episode num 2975  Steps count? : 100, Cost: 1698.0
For episode num 2976  Steps count? : 100, Cost: 1698.0
For episode num 2977  Steps count? : 100, Cost: 1698.0
For episode num 2978  Steps count? : 100, Cost: 1698.0
For episode num 2979  Steps count? : 100, Cost: 1698.0
For episode num 2980  Steps count? : 100, Cost: 1698.0
For episode num 2981  Steps count? : 100, Cost: 1698.0
For episode num 2982  Steps count? : 100, Cost: 1698.0
For episode num 2983  Steps count? : 100, Cost: 1698.0
For episode num 2984  Steps count? : 100, Cost: 1698.0
For episode num 2985  Steps count? : 100, Cost: 1698.0
For episode num 2986  Steps count? : 100, Cost: 1698.0
For episode num 2987  Steps count? : 100, Cost: 1698.0
For episode num 2988  Steps count? : 100, Cost: 1698.0
For episode num 2989  Steps count? : 100, Cost: 1698.0
For episode num 2990  Steps count? : 100, Cost: 1698.0
For episode num 2991  Steps count? : 100, Cost: 1698.0
For episode num 2992  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 96... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.04190485551953316 Actual: 0.04183264076709747
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.11769814789295197   │
│ Metrics/EpCost                │ 0.0                    │
│ Metrics/EpLen                 │ 100.0                  │
│ Train/Epoch                   │ 96.0                   │
│ Train/Entropy                 │ 0.36352452635765076    │
│ Train/KL                      │ 0.0002912452910095453  │
│ Train/StopIter                │ 8.0                    │
│ Train/PolicyRatio/Mean        │ 0.9999170303344727     │
│ Train/PolicyRatio/Min         │ 0.9999170303344727     │
│ Train/PolicyRatio/Max         │ 0.9999170303344727     │
│ Train/PolicyRatio/Std         │ 5.869652522960678e-05  │
│ Train/LR                      │ 0.0                    │
│ Train/PolicyStd               │ 0.3480484187602997     │
│ TotalEnvSteps                 │ 194000.0               │
│ Loss/Loss_pi                  │ -0.03137456253170967   │
│ Loss/Loss_pi/Delta            │ -6.498396396636963e-05 │
│ Value/Adv                     │ 8.130073325673948e-08  │
│ Loss/Loss_reward_critic       │ 0.00022053963039070368 │
│ Loss/Loss_reward_critic/Delta │ -3.900667070411146e-05 │
│ Value/reward                  │ -0.13821573555469513   │
│ Loss/Loss_cost_critic         │ 1.6779144046408234e-11 │
│ Loss/Loss_cost_critic/Delta   │ -6.722934708935924e-12 │
│ Value/cost                    │ 7.432424172293395e-05  │
│ Time/Total                    │ 270.4680480957031      │
│ Time/Rollout                  │ 1.5749855041503906     │
│ Time/Update                   │ 0.9095478057861328     │
│ Time/Epoch                    │ 2.484550714492798      │
│ Time/FPS                      │ 804.9749145507812      │
│ Misc/Alpha                    │ 0.4774302542209625     │
│ Misc/FinalStepNorm            │ 0.040106598287820816   │
│ Misc/gradient_norm            │ 10.860573768615723     │
│ Misc/xHx                      │ 0.08774250745773315    │
│ Misc/H_inv_g                  │ 0.08400513976812363    │
│ Misc/AcceptanceStep           │ 1.0                    │
│ Misc/cost_gradient_norm       │ 2.3510371249813034e-07 │
│ Misc/A                        │ 0.0                    │
│ Misc/B                        │ 0.0                    │
│ Misc/q                        │ 0.08774250745773315    │
│ Misc/r                        │ 1.848082667801898e-12  │
│ Misc/s                        │ 4.576892434362764e-20  │
│ Misc/Lambda_star              │ 2.0945467948913574     │
│ Misc/Nu_star                  │ 0.0                    │
│ Misc/OptimCase                │ 4.0                    │
└───────────────────────────────┴────────────────────────┘
For episode num 2993  Steps count? : 0, Cost: 1698.0
For episode num 2994  Steps count? : 100, Cost: 1698.0
For episode num 2995  Steps count? : 100, Cost: 1698.0
For episode num 2996  Steps count? : 100, Cost: 1698.0
For episode num 2997  Steps count? : 100, Cost: 1698.0
For episode num 2998  Steps count? : 100, Cost: 1698.0
For episode num 2999  Steps count? : 100, Cost: 1698.0
For episode num 3000  Steps count? : 100, Cost: 1698.0
For episode num 3001  Steps count? : 100, Cost: 1698.0
For episode num 3002  Steps count? : 100, Cost: 1698.0
For episode num 3003  Steps count? : 100, Cost: 1698.0
For episode num 3004  Steps count? : 100, Cost: 1698.0
For episode num 3005  Steps count? : 100, Cost: 1698.0
For episode num 3006  Steps count? : 100, Cost: 1698.0
For episode num 3007  Steps count? : 100, Cost: 1698.0
For episode num 3008  Steps count? : 100, Cost: 1698.0
For episode num 3009  Steps count? : 100, Cost: 1698.0
For episode num 3010  Steps count? : 100, Cost: 1698.0
For episode num 3011  Steps count? : 100, Cost: 1698.0
For episode num 3012  Steps count? : 100, Cost: 1698.0
For episode num 3013  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 97... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.03754367306828499 Actual: 0.037474166601896286
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.11505778133869171    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 97.0                    │
│ Train/Entropy                 │ 0.3631350100040436      │
│ Train/KL                      │ 0.00027353677432984114  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.9991974830627441      │
│ Train/PolicyRatio/Min         │ 0.9991974830627441      │
│ Train/PolicyRatio/Max         │ 0.9991974830627441      │
│ Train/PolicyRatio/Std         │ 0.0005674370913766325   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.34791287779808044     │
│ TotalEnvSteps                 │ 196000.0                │
│ Loss/Loss_pi                  │ -0.028105642646551132   │
│ Loss/Loss_pi/Delta            │ 0.003268919885158539    │
│ Value/Adv                     │ 2.9087066977240283e-08  │
│ Loss/Loss_reward_critic       │ 0.0001858571486081928   │
│ Loss/Loss_reward_critic/Delta │ -3.468248178251088e-05  │
│ Value/reward                  │ -0.13354209065437317    │
│ Loss/Loss_cost_critic         │ 1.2727067663642622e-11  │
│ Loss/Loss_cost_critic/Delta   │ -4.0520763827656125e-12 │
│ Value/cost                    │ 6.402180588338524e-05   │
│ Time/Total                    │ 272.9353942871094       │
│ Time/Rollout                  │ 1.5582101345062256      │
│ Time/Update                   │ 0.8911545276641846      │
│ Time/Epoch                    │ 2.4493777751922607      │
│ Time/FPS                      │ 816.5341186523438       │
│ Misc/Alpha                    │ 0.5322578549385071      │
│ Misc/FinalStepNorm            │ 0.08183583617210388     │
│ Misc/gradient_norm            │ 9.278188705444336       │
│ Misc/xHx                      │ 0.07059693336486816     │
│ Misc/H_inv_g                  │ 0.15375223755836487     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 3.378349902050104e-06   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.07059693336486816     │
│ Misc/r                        │ 4.010106469110042e-09   │
│ Misc/s                        │ 1.4673846312302993e-15  │
│ Misc/Lambda_star              │ 1.8787885904312134      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 3014  Steps count? : 0, Cost: 1698.0
For episode num 3015  Steps count? : 100, Cost: 1698.0
For episode num 3016  Steps count? : 100, Cost: 1698.0
For episode num 3017  Steps count? : 100, Cost: 1698.0
For episode num 3018  Steps count? : 100, Cost: 1698.0
For episode num 3019  Steps count? : 100, Cost: 1698.0
For episode num 3020  Steps count? : 100, Cost: 1698.0
For episode num 3021  Steps count? : 100, Cost: 1698.0
For episode num 3022  Steps count? : 100, Cost: 1698.0
For episode num 3023  Steps count? : 100, Cost: 1698.0
For episode num 3024  Steps count? : 100, Cost: 1698.0
For episode num 3025  Steps count? : 100, Cost: 1698.0
For episode num 3026  Steps count? : 100, Cost: 1698.0
For episode num 3027  Steps count? : 100, Cost: 1698.0
For episode num 3028  Steps count? : 100, Cost: 1698.0
For episode num 3029  Steps count? : 100, Cost: 1698.0
For episode num 3030  Steps count? : 100, Cost: 1698.0
For episode num 3031  Steps count? : 100, Cost: 1698.0
For episode num 3032  Steps count? : 100, Cost: 1698.0
For episode num 3033  Steps count? : 100, Cost: 1698.0
For episode num 3034  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 98... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.04064250737428665 Actual: 0.03957901522517204
Accept step at i=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.1101965606212616     │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 98.0                    │
│ Train/Entropy                 │ 0.3604905605316162      │
│ Train/KL                      │ 0.00028300087433308363  │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 1.0012989044189453      │
│ Train/PolicyRatio/Min         │ 1.0012989044189453      │
│ Train/PolicyRatio/Max         │ 1.0012989044189453      │
│ Train/PolicyRatio/Std         │ 0.0009184640948660672   │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.3469943106174469      │
│ TotalEnvSteps                 │ 198000.0                │
│ Loss/Loss_pi                  │ -0.029684320092201233   │
│ Loss/Loss_pi/Delta            │ -0.0015786774456501007  │
│ Value/Adv                     │ 5.97238525301691e-08    │
│ Loss/Loss_reward_critic       │ 0.00015456866822205484  │
│ Loss/Loss_reward_critic/Delta │ -3.128848038613796e-05  │
│ Value/reward                  │ -0.12908723950386047    │
│ Loss/Loss_cost_critic         │ 1.532627323663771e-11   │
│ Loss/Loss_cost_critic/Delta   │ 2.599205572995089e-12   │
│ Value/cost                    │ 5.4546977480640635e-05  │
│ Time/Total                    │ 275.4315490722656       │
│ Time/Rollout                  │ 1.5694918632507324      │
│ Time/Update                   │ 0.9088129997253418      │
│ Time/Epoch                    │ 2.4783213138580322      │
│ Time/FPS                      │ 806.9981689453125       │
│ Misc/Alpha                    │ 0.49195191264152527     │
│ Misc/FinalStepNorm            │ 0.048055022954940796    │
│ Misc/gradient_norm            │ 10.700468063354492      │
│ Misc/xHx                      │ 0.08263891935348511     │
│ Misc/H_inv_g                  │ 0.09768235683441162     │
│ Misc/AcceptanceStep           │ 1.0                     │
│ Misc/cost_gradient_norm       │ 1.444690838070528e-07   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.08263891935348511     │
│ Misc/r                        │ -4.3557811437583416e-13 │
│ Misc/s                        │ 6.5306797933985176e-21  │
│ Misc/Lambda_star              │ 2.0327188968658447      │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
For episode num 3035  Steps count? : 0, Cost: 1698.0
For episode num 3036  Steps count? : 100, Cost: 1698.0
For episode num 3037  Steps count? : 100, Cost: 1698.0
For episode num 3038  Steps count? : 100, Cost: 1698.0
For episode num 3039  Steps count? : 100, Cost: 1698.0
For episode num 3040  Steps count? : 100, Cost: 1698.0
For episode num 3041  Steps count? : 100, Cost: 1698.0
For episode num 3042  Steps count? : 100, Cost: 1698.0
For episode num 3043  Steps count? : 100, Cost: 1698.0
For episode num 3044  Steps count? : 100, Cost: 1698.0
For episode num 3045  Steps count? : 100, Cost: 1698.0
For episode num 3046  Steps count? : 100, Cost: 1698.0
For episode num 3047  Steps count? : 100, Cost: 1698.0
For episode num 3048  Steps count? : 100, Cost: 1698.0
For episode num 3049  Steps count? : 100, Cost: 1698.0
For episode num 3050  Steps count? : 100, Cost: 1698.0
For episode num 3051  Steps count? : 100, Cost: 1698.0
For episode num 3052  Steps count? : 100, Cost: 1698.0
For episode num 3053  Steps count? : 100, Cost: 1698.0
For episode num 3054  Steps count? : 100, Cost: 1698.0
For episode num 3055  Steps count? : 100, Cost: 1698.0
Warning: trajectory cut off when rollout by epoch at 100.0 steps.
Processing rollout for epoch: 99... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
Expected Improvement: 0.03942681849002838 Actual: 0.039584994316101074
INFO: violated KL constraint 0.010192926973104477 at step 1.
Expected Improvement: 0.03942681849002838 Actual: 0.031654637306928635
Accept step at i=2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metrics                       ┃ Value                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Metrics/EpRet                 │ -0.10633362084627151    │
│ Metrics/EpCost                │ 0.0                     │
│ Metrics/EpLen                 │ 100.0                   │
│ Train/Epoch                   │ 99.0                    │
│ Train/Entropy                 │ 0.35707953572273254     │
│ Train/KL                      │ 0.000190990642295219    │
│ Train/StopIter                │ 8.0                     │
│ Train/PolicyRatio/Mean        │ 0.999496340751648       │
│ Train/PolicyRatio/Min         │ 0.999496340751648       │
│ Train/PolicyRatio/Max         │ 0.999496340751648       │
│ Train/PolicyRatio/Std         │ 0.000296044338028878    │
│ Train/LR                      │ 0.0                     │
│ Train/PolicyStd               │ 0.34581267833709717     │
│ TotalEnvSteps                 │ 200000.0                │
│ Loss/Loss_pi                  │ -0.02690983936190605    │
│ Loss/Loss_pi/Delta            │ 0.0027744807302951813   │
│ Value/Adv                     │ 6.58035261835721e-08    │
│ Loss/Loss_reward_critic       │ 0.00012991549738217145  │
│ Loss/Loss_reward_critic/Delta │ -2.4653170839883387e-05 │
│ Value/reward                  │ -0.12430702149868011    │
│ Loss/Loss_cost_critic         │ 6.622192811472916e-12   │
│ Loss/Loss_cost_critic/Delta   │ -8.704080425164795e-12  │
│ Value/cost                    │ 4.664385051000863e-05   │
│ Time/Total                    │ 277.9254150390625       │
│ Time/Rollout                  │ 1.573190450668335       │
│ Time/Update                   │ 0.9020123481750488      │
│ Time/Epoch                    │ 2.475221633911133       │
│ Time/FPS                      │ 808.0094604492188       │
│ Misc/Alpha                    │ 0.5074406266212463      │
│ Misc/FinalStepNorm            │ 0.025986697524785995    │
│ Misc/gradient_norm            │ 10.305828094482422      │
│ Misc/xHx                      │ 0.0776710957288742      │
│ Misc/H_inv_g                  │ 0.06401413679122925     │
│ Misc/AcceptanceStep           │ 2.0                     │
│ Misc/cost_gradient_norm       │ 2.156172968170722e-06   │
│ Misc/A                        │ 0.0                     │
│ Misc/B                        │ 0.0                     │
│ Misc/q                        │ 0.0776710957288742      │
│ Misc/r                        │ 1.3560538247858744e-09  │
│ Misc/s                        │ 2.839775364458976e-16   │
│ Misc/Lambda_star              │ 1.97067391872406        │
│ Misc/Nu_star                  │ 0.0                     │
│ Misc/OptimCase                │ 4.0                     │
└───────────────────────────────┴─────────────────────────┘
Time for training: 278
