DAgger algorithm

“Dataset Aggreation”

  • train behavior-cloned policy to match human data
  • collect observations by running the policy
  • ask human to label these new observations with actions