Running Reverse KL,Reward Loss,Real Det Return,Itration,Running Update Time,Real Sto Return,Running Forward KL,Real Sto violation,Real Det violation,Running Env Steps
11.299,787.8919067382812,-635.7,0,0,-162.21,17.4927,1.0,0.25,0
12.0677,792.8230590820312,-1143.42,1,1,-242.72,18.5526,1.0,0.0,5000
12.6099,829.4905395507812,-1508.72,2,2,-320.61,18.3325,1.0,0.6,10000
12.5068,808.309326171875,-1226.9,3,3,-327.36,19.0816,1.0,0.4,15000
11.6218,724.5675048828125,-1659.18,4,4,-330.61,19.4504,1.0,0.0,20000
11.9928,679.9727783203125,-1608.42,5,5,-426.41,18.6831,1.0,0.0,25000
11.0116,599.6758422851562,-1509.62,6,6,-386.44,18.5759,1.0,0.0,30000
10.9763,609.7974243164062,-1600.46,7,7,-425.87,18.1527,1.0,0.0,35000
11.4829,594.4301147460938,-1530.8,8,8,-434.3,18.6343,1.0,0.0,40000
11.0245,572.3143310546875,-1487.08,9,9,-393.13,18.5094,1.0,0.0,45000
