would be nice to have it.
- take as many samples as are available
- compute as long as I'll allow it, pick a policy as good as possible
I wonder:
- what's the performance of RL on different network sizes? does a too big network hurt? does ResNet help? how about different architectures?