Running Reverse KL,Real Det violation,Running Forward KL,Real Sto Return,Running Update Time,Itration,Reward Loss,Running Env Steps,Real Det Return,Real Sto violation
10.5752,0.0,17.0988,1768.93,0,0,-126.5109634399414,0,1749.91,0.85
11.3264,0.0,15.8858,1814.44,1,1,-56.62194061279297,5000,1749.8,0.4
11.1252,0.0,15.5296,1833.09,2,2,-66.33019256591797,10000,1753.36,0.7
11.3509,0.0,14.8822,1855.35,3,3,-58.43777847290039,15000,1758.61,0.25
11.7126,0.0,13.9237,1879.2,4,4,-58.57889175415039,20000,1757.28,0.1
11.9797,0.0,13.2953,1889.8,5,5,-70.86473846435547,25000,1756.54,0.1
11.9344,0.0,13.8861,1889.47,6,6,-84.8228759765625,30000,1755.17,0.2
11.7296,0.0,13.2842,1899.1,7,7,-98.52854919433594,35000,1755.33,0.15
11.9891,0.0,13.4427,1890.54,8,8,-108.96160888671875,40000,1754.86,0.05
11.8649,0.0,13.7295,1891.13,9,9,-127.46074676513672,45000,1755.3,0.1
11.9204,0.0,13.844,1897.52,10,10,-140.04469299316406,50000,1756.18,0.15
11.6007,0.0,14.0485,1894.01,11,11,-154.3444061279297,55000,1755.52,0.15
11.6866,0.0,13.2046,1915.55,12,12,-166.03289794921875,60000,1755.68,0.2
11.5896,0.0,13.1827,1918.04,13,13,-176.60862731933594,65000,1756.72,0.1
11.4838,0.0,13.665,1920.72,14,14,-189.758544921875,70000,1756.58,0.15
