Time	Iteration	AverageReward	StdRewards	MaxRewardRollout	MinRewardRollout	timesteps	gradnorms	maxnorms
38.062052965164185	0	-0.813261981599876	19.297677396125206	29.944989854469895	-27.727565122768283	0	0.0	0
676.2347223758698	800	70.89207076722062	2.7789199165575074	76.96374590531923	63.13048331113532	1600000	7.795253766982112	8.083614676355632
1305.5019743442535	1600	86.72747802292245	2.952026782103999	90.54271045702626	77.49016036471585	3200000	7.801809772778491	8.052395218102397
1932.951963186264	2400	93.42355370981291	4.236233533251155	99.95463563722114	81.91204631241271	4800000	7.866107053362715	8.4517978980669
2562.6408512592316	3200	78.5785027087868	12.714798120372047	101.06626448384486	42.074731688713655	6400000	7.892824212923357	7.667692945505915
3188.96883392334	4000	48.43356365754207	21.060112014659733	94.15272733812162	11.167252272167389	8000000	7.942430317464196	8.869517895026563
3817.504369735718	4800	46.41900897808302	22.310401344887122	85.6609641078976	-2.9549081666054917	9600000	7.875098714781214	8.943719223887136
3984.528783798218	5000	44.63399998410556	14.608625339418328	77.5241828867147	10.523685910684435	10000000	7.84417520712916	8.943719223887136
