parent: Imitation learning

DAgger algorithm

“Dataset Aggreation”

train behavior-cloned policy to match human data
collect observations by running the policy
ask human to label these new observations with actions