Time	Iteration	AverageReward	StdRewards	MaxRewardRollout	MinRewardRollout	timesteps	gradnorms	maxnorms
38.860190629959106	0	-1.4489036436221607	18.17712229330612	27.28727449197322	-27.71342122834176	0	0.0	0
664.2422757148743	800	348.50844704956836	2.3900202057052136	353.2578750131652	342.8825781234773	1600000	7.926294243168741	7.5877101794288695
1287.501422405243	1600	346.4201102662935	1.9934381596228155	351.2079296643751	342.2804384707124	3200000	7.887194252697017	7.568343972157243
1909.9551219940186	2400	347.3894616922093	1.9065231224426036	351.7637481119491	342.95837933661824	4800000	7.93342331314513	8.603386055263705
2532.3725571632385	3200	345.5424166933646	1.7931396324193556	349.9550653671231	341.9950416160864	6400000	7.941090753583663	8.277894540695492
3157.026946544647	4000	351.7227203985508	2.000407666743728	356.2431469219737	346.59280375158414	8000000	7.821151104960922	8.337850083206233
3782.7462751865387	4800	335.01052384694157	2.489402741090142	341.8709985356545	330.20844056643546	9600000	7.905440584493681	8.601732804626712
3949.131552219391	5000	334.249658495641	1.8469393858428427	337.902458403958	329.574685778236	10000000	7.911730551799422	8.601732804626712
