Add Proof for Using Q-Function in Policy Gradient Formula into Anki