-
LLM RL: from Bellman Target, PPO Clip to Token/Sequence Ratio and Training-Inference Mismatch
A structured reinforcement learning tutorial from Bellman targets and policy gradients to PPO, GRPO, DAPO, CISPO, GSPO, DPO, and training-inference mismatch.
-
Inside nano-vLLM: Scheduler, Paged KV Cache, Prefill/Decode, and Sampling
沿请求生命周期拆解 scheduler、paged KV cache、prefill/decode、prefix cache 和采样。
-
Distributed Training for Large Models: Collectives, ZeRO/FSDP, Tensor, Pipeline, and Expert Parallelism
从 parameter、activation、gradient、optimizer state 的生命周期出发,梳理 All-Reduce/All-Gather/Reduce-Scatter/All-to-All、DDP、ZeRO/FSDP、TP/PP/SP/CP/EP 及显存-通信权衡。
-
Modern Attention for LLMs: MHA/MQA/GQA, RoPE, MLA, FlashAttention, and MoE
从 MHA/MQA/GQA、RoPE、MLA 到 FlashAttention、MoE 的注意力机制笔记。
-
Flow-Based Generative Models: From Normalizing Flows to Flow Matching, Reflow, and MeanFlow
沿着 exact likelihood、probability path、velocity regression 和 average velocity 这条主线,梳理 NF/CNF、FM/CFM、Gaussian/OT/CondOT path、Stochastic Interpolants、Rectified Flow/Reflow、MeanFlow 与 CFG。