Time	Iteration	AverageReward	StdRewards	MaxRewardRollout	MinRewardRollout	timesteps	gradnorms	maxnorms
37.233673095703125	0	-0.813261981599876	19.297677396125206	29.944989854469895	-27.727565122768283	0	0.0	0
660.4098806381226	800	355.75637056499124	1.9737280060255569	359.524179566768	350.66286714037415	1600000	0.9987515605493134	0.25
1283.3778448104858	1600	356.3050495858723	2.5300064452731386	360.8381693052361	350.35137572229723	3200000	0.999000999000999	0.25
1904.7543904781342	2400	354.92654566266106	2.3854587991908183	359.96239518537186	348.3284715191694	4800000	0.999000999000999	0.25
2525.6128261089325	3200	355.0070355847042	1.9724670451027821	360.2775553583633	351.4726065586874	6400000	0.999000999000999	0.25
3146.5762887001038	4000	355.9304791111229	2.1840134973598126	360.7242870304035	350.7247798642129	8000000	0.999000999000999	0.25
3767.9043090343475	4800	355.31423449890167	2.268427232433739	360.7362024827453	348.22545644734055	9600000	0.999000999000999	0.25
3933.297700881958	5000	355.0764180592108	1.7421216577160925	358.55750990018714	349.91958750606864	10000000	0.999000999000999	0.25
