Add derivation of trajectory return policy gradient to Anki

This note has no content.