Inverse soft Q-learning