You May Also Enjoy
Let’s talk about reward and value
7 minute read
Published:
I will skip the notation and introduction on RL as a Markov Decision Process, for that you can refer to this blog. Here is a checklist of questions to ask yourself before proceeding:
- what $J(\theta)$ and $\nabla_{\theta} J(\theta)$ (recall policy gradient theorem) look like?
- what is $Q$, $A$, $V$?
- do you have 10 minutes to spare?
What is the architecture of an ideal agent?
4 minute read
Published:
Here is a beta version of an ideal agent that I have been thinking about, this helps me personally to categorize a large amout of new ML research paper to one of the boxes or arrows, and to identify gaps.
