llm-rl

an archive of posts with this tag

Jun 25, 2026	Modern RL Objectives in Code: Bellman Targets, Trust Regions, PPO Clip, KL Estimators, and LLM Token/Sequence Granularity