There's a few ways to write the value of a policy and i can't prove they have the same gradient

This note has no content.