parent: Finish reading Spinning up in Deep RL

Add Proof for Using Q-Function in Policy Gradient Formula into Anki

https://spinningup.openai.com/en/latest/spinningup/extra_pg_proof2.html

Asynchronous Methods for Deep Reinforcement Learning
- 1602.01783.pdf
Finish reading Spinning up in Deep RL
- Add Proof for Using Q-Function in Policy Gradient Formula into Anki
Add this expected grad-log-prob lemma card to Anki
Add derivation of trajectory return policy gradient to Anki
Do Deep RL course on Berkeley
Do Deep RL Bootcamp
Learning Tetris Using the Noisy Cross-Entropy Method
- neco.2006.18.12.2936.pdf
Opportunities
- Microsoft AI residency
Learn DDPG
- Continuous control with deep reinforcement learning
  - 1509.02971.pdf
Add Proof for Using Q-Function in Policy Gradient Formula into Anki
Learn PPO
- Proximal Policy Optimization Algorithms
  - 1707.06347.pdf
Learn conjugate gradient algorithm
Learn QR-DQN
- Distributional Reinforcement Learning with Quantile Regression
  - 1710.10044.pdf
Learn Twin Delayed DDPG
Implement TD(lambda)
Learn batch norm, layer norm, weight norm
Learn C51
- A Distributional Perspective on Reinforcement Learning
  - 1707.06887.pdf
Learn SVG
- Learning Continuous Control Policies by Stochastic Value Gradients
  - 1510.09142.pdf
Learn MBMF
- Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
  - 1708.02596.pdf
Read thesis with TRPO
Learn natural policy gradient methods
- A Natural Policy Gradient
  - NIPS-2001-a-natural-policy-gradient-Paper.pdf
Learn Soft Actor-Critic
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
  - 1801.01290.pdf
Read and Ankify Algorithms for reinforcement learning 2009 lecture
Ankify Nuts and bolts of deep RL research
Learn and understand deep belief network per-layer (pre)training
Why does L1 penalty encourage sparsity?
Learn more about energy models
AI safety materials at UMass
Level up in AI resources
There's a few ways to write the value of a policy and i can't prove they have the same gradient
Get TensorFlow developer certificate
AGI Safety Fundamentals course
How to do RL on a POMDP?
Learn how a k-d tree works
"ML algorithms cheat sheet"