Soft Q imitation learning