Reward Loss,Real Sto Return,Running Forward KL,Running Update Time,Running Env Steps,Real Det Return,Real Sto violation,Itration,Real Det violation,Running Reverse KL
728.2107543945312,-232.67,18.2657,0,0,-1492.87,1.0,0,0.1,11.0556
719.4381713867188,-246.8,18.5863,1,5000,-1176.01,1.0,1,0.0,11.0489
774.94140625,-336.94,18.8803,2,10000,-1483.76,1.0,2,0.0,12.0992
741.4623413085938,-342.2,18.301,3,15000,-1561.57,1.0,3,0.0,12.0544
708.1381225585938,-302.96,18.5306,4,20000,-1498.89,1.0,4,0.0,11.8089
651.229248046875,-386.46,18.6185,5,25000,-1270.93,1.0,5,0.0,11.4914
622.4747924804688,-310.15,18.1824,6,30000,-1451.0,1.0,6,0.0,11.1574
578.906982421875,-357.09,18.31,7,35000,-1386.08,1.0,7,0.0,10.8746
586.2461547851562,-388.0,18.3102,8,40000,-1449.95,1.0,8,0.0,11.1429
548.7097778320312,-358.06,18.3471,9,45000,-1691.36,1.0,9,0.0,11.0247
502.8822937011719,-438.33,17.8756,10,50000,-1652.53,1.0,10,0.0,10.3243
