Let’s talk about reward and value
Published:
I will skip the notation and introduction on RL as a Markov Decision Process, for that you can refer to this blog. Here is a checklist of questions to ask yourself before proceeding:
- what $J(\theta)$ and $\nabla_{\theta} J(\theta)$ (recall policy gradient theorem) look like?
- what is $Q$, $A$, $V$?
- do you have 10 minutes to spare?
