Blog posts

2025

Let’s talk about reward and value

7 minute read

Published:

I will skip the notation and introduction on RL as a Markov Decision Process, for that you can refer to this blog. Here is a checklist of questions to ask yourself before proceeding:

  • what $J(\theta)$ and $\nabla_{\theta} J(\theta)$ (recall policy gradient theorem) look like?
  • what is $Q$, $A$, $V$?
  • do you have 10 minutes to spare?

What is the architecture of an ideal agent?

4 minute read

Published:

Here is a beta version of an ideal agent that I have been thinking about, this helps me personally to categorize a large amout of new ML research paper to one of the boxes or arrows, and to identify gaps.