Blog

2026

Agent Murphy

10 minute read

Published:

Recently my research lab built a custom agent running on a spare lower-end Mac mini box that my mentor happens to have. We call it Murphy, in defiant spirit to Murphy’s law and tribute to the beloved Interstellar.

2025

Let’s talk about reward and value

7 minute read

Published:

I will skip the notation and introduction on RL as a Markov Decision Process, for that you can refer to this blog. Here is a checklist of questions to ask yourself before proceeding:

  • what $J(\theta)$ and $\nabla_{\theta} J(\theta)$ (recall policy gradient theorem) look like?
  • what is $Q$, $A$, $V$?
  • do you have 10 minutes to spare?

What is the architecture of an ideal agent?

4 minute read

Published:

Here is a beta version of an ideal agent that I have been thinking about, this helps me personally to categorize a large amout of new ML research paper to one of the boxes or arrows, and to identify gaps.