Generalized advantage estimation