initial performance: 7923
episode: 0 training return: tensor(7.9752e-05, device='cuda:0', grad_fn=<AddBackward0>)
