Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model