[0627]
Try target policy (X) 5
Fixed (?) policy gradient direction (loss = -policy_loss) (X) 6
Try value policy (value network apart from policy network) (X) 7
Try value policy (value network apart from policy network) without 6 (X) 8
Try change hard update location value policy (value network apart from policy network) without 6 (X) 9
Try change hard update location without 6 (X) 10
[0628]
@ not apart value policy
Fixed (?) policy gradient direction (loss = -policy_loss) without 5 (?) 13
Fixed (?) policy gradient direction (loss = -policy_loss) (X) 14
Fixed (?) policy gradient direction (loss = -policy_loss) without 5 (?) long episode 15
org version long 16
org version long const lr 2e-4 17
Fixed (?) policy gradient direction (loss = -policy_loss) without 5 (?) long episode 18

[0630]
Try target policy again (X) 21 # 5 not update value policy

