Policy Gradient Methods for Reinforcement Learning with Function Approximation