Use the loss function of the Policy Gradient algorithm to understand REINFORCE, Actor-Critic, and Proximal Policy Optimization (PPO).
Originally appeared here:
Understand REINFORCE, Actor-Critic and PPO in one go
Go Here to Read this Fast! Understand REINFORCE, Actor-Critic and PPO in one go