Running Update Time,Running Reverse KL,Itration,Real Det violation,Reward Loss,Running Env Steps,Real Det Return,Real Sto Return,Running Forward KL,Real Sto violation
0,10.2688,0,0.0,642.8675537109375,0,-1241.98,-133.53,17.9319,1.0
1,10.9246,1,0.6,743.3488159179688,5000,-1235.17,-200.98,18.084,1.0
2,11.7107,2,0.65,758.7296752929688,10000,-1479.88,-326.29,18.5834,1.0
3,12.355,3,0.35,813.9912719726562,15000,-1314.93,-290.85,18.7765,1.0
4,11.3125,4,0.0,693.3898315429688,20000,-1709.91,-320.66,18.5491,1.0
5,11.3992,5,0.0,673.626220703125,25000,-1603.69,-346.85,18.093,1.0
6,11.2301,6,0.0,627.88232421875,30000,-1572.39,-351.17,18.0491,1.0
7,11.3129,7,0.0,616.1088256835938,35000,-1749.14,-371.08,18.6005,1.0
8,10.9166,8,0.0,566.357421875,40000,-1780.49,-405.06,18.3068,1.0
9,10.7374,9,0.0,547.0039672851562,45000,-1780.26,-347.59,17.9008,1.0
10,10.3768,10,0.0,524.0745849609375,50000,-1577.07,-343.31,18.0228,1.0
11,10.54,11,0.0,508.58001708984375,55000,-1605.04,-402.31,18.0911,1.0
12,10.166,12,0.0,478.802490234375,60000,-1618.5,-430.0,18.1175,1.0
13,9.7792,13,0.0,430.02764892578125,65000,-1487.7,-361.98,17.5554,0.95
14,9.9431,14,0.0,431.9472351074219,70000,-1643.51,-422.05,17.5987,0.95
15,9.9288,15,0.0,404.22100830078125,75000,-1640.68,-481.88,17.6624,1.0
16,9.7258,16,0.0,391.2008972167969,80000,-1701.08,-342.38,17.9817,1.0
17,9.6396,17,0.0,372.94140625,85000,-1526.01,-393.0,17.4714,1.0
