Time	Iteration	AverageReward	StdRewards	MaxRewardRollout	MinRewardRollout	timesteps	gradnorms	maxnorms
37.247536420822144	0	-1.4489036436221607	18.17712229330612	27.28727449197322	-27.71342122834176	0	0.0	0
664.7764456272125	800	356.60388235875655	1.8615465614587132	360.0577591501642	351.18230323586613	1600000	0.9987515605493134	0.7788364680642891
1281.8723542690277	1600	356.5530827827828	2.3200331876865996	360.96765045722714	347.99123885226436	3200000	0.999000999000999	0.8259945830604813
1899.4210975170135	2400	355.783145804004	2.917948883942361	361.3678168090082	347.44538780535913	4800000	0.999000999000999	0.8259945830604813
2520.577661037445	3200	355.65601534218035	2.043289258848944	359.375865549584	349.517900034014	6400000	0.999000999000999	0.8460234558216387
3138.5700566768646	4000	355.5025578892756	2.0993297417553243	360.9356409786269	350.63514238098287	8000000	0.999000999000999	0.7912563447396888
3760.62061214447	4800	356.5651624181486	1.97329287295788	360.71695714781526	352.83003948123223	9600000	0.999000999000999	0.8244942356980343
3925.7090690135956	5000	356.3400193901448	2.1593730690116546	362.14655833574943	351.64510870038066	10000000	0.999000999000999	0.8244942356980343
