Time	Iteration	AverageReward	StdRewards	MaxRewardRollout	MinRewardRollout	timesteps	gradnorms	maxnorms
39.12086296081543	0	-1.4489036436221607	18.17712229330612	27.28727449197322	-27.71342122834176	0	0.0	0
665.0970814228058	800	355.3588017455404	2.418369513362034	360.3878923835	348.75602148182224	1600000	1.8639048917462033	0.5
1288.3650164604187	1600	355.6525609611333	1.4436951600878327	359.57768296118593	352.1148199264426	3200000	1.858700528946842	0.5
1911.106202363968	2400	353.62253372384004	42.50279353419308	361.21378387953155	-68.69427317171358	4800000	1.8626653369014383	0.5
2542.895303249359	3200	330.70704866273377	89.75614968593497	356.458104395424	-165.1572259220411	6400000	1.8625555682772517	0.5
3165.7719597816467	4000	335.87617137232394	87.7603358215967	357.3829857534729	-110.9123269484844	8000000	1.8594283550173394	0.5
3792.753440141678	4800	354.5406704655237	7.630499740381307	358.985978085082	302.91349698934937	9600000	1.8639075124607407	0.5
3958.4108204841614	5000	352.72349807509084	2.210021831318633	357.97522099281196	345.7582888051402	10000000	1.8645748661891786	0.5
