Time	Iteration	AverageReward	StdRewards	MaxRewardRollout	MinRewardRollout	timesteps	gradnorms	maxnorms
36.62730669975281	0	-0.3023757225339068	20.43353154720061	29.695772328414023	-28.46130512468517	0	0.0	0
661.849267244339	800	351.2864018014068	2.6168930358883458	355.7678794263047	341.8767207171768	1600000	0.9987515605493134	0.25
1279.282569885254	1600	356.62141444659255	2.426245335675341	361.35010998195503	350.2217613585526	3200000	0.999000999000999	0.25
1896.9340515136719	2400	355.50863357204366	2.0390412757011225	360.87365321863035	351.6165263326984	4800000	0.999000999000999	0.25
2516.4357421398163	3200	355.2021678001502	1.8397275758832783	359.15745406219503	350.8696887929691	6400000	0.999000999000999	0.25
3135.9989819526672	4000	355.98531783178873	2.2172368051823916	360.44807412070804	350.863049446838	8000000	0.999000999000999	0.25
3753.805795431137	4800	356.28654295050336	2.0966227117257823	360.89181655377615	351.4574186789396	9600000	0.999000999000999	0.25
3918.748336791992	5000	354.68132417738656	1.7258648458193713	358.3477343348786	350.69017260940745	10000000	0.999000999000999	0.25
