Time	Iteration	AverageReward	StdRewards	MaxRewardRollout	MinRewardRollout	timesteps	gradnorms	maxnorms
37.02205491065979	0	-0.813261981599876	19.297677396125206	29.944989854469895	-27.727565122768283	0	0.0	0
661.4460258483887	800	354.3128897783049	1.6216094977886086	358.34629157651216	351.8346846321365	1600000	0.9987515605493134	0.8335052193098482
1280.1879963874817	1600	355.89968346187635	1.5275723337061156	359.9679380604357	352.71729245083407	3200000	0.999000999000999	0.8888102600483683
1899.2322421073914	2400	356.1914494325504	2.1752964638769474	360.94121999293566	350.4959287672318	4800000	0.999000999000999	0.8359639392569422
2516.484910964966	3200	354.958165474071	1.263476635777037	358.1988776307553	351.364815940673	6400000	0.999000999000999	0.7771568937855775
3136.1084117889404	4000	357.03847935822006	1.102948913034202	359.18488916329807	354.0845790958265	8000000	0.999000999000999	0.7991409234107467
3755.1918330192566	4800	355.07343971219046	1.6884019394677494	360.40684858575696	350.9808866015519	9600000	0.999000999000999	0.8136182977732985
3920.2514684200287	5000	357.49576764565063	2.2447573823144373	361.2895821455895	352.36920701223426	10000000	0.999000999000999	0.8136182977732985
