notes

research and engineering notes on medical AI, scientific computing, safe RL, generative modeling, contrastive learning, and large-model systems

Modern RL Objectives in Code: Bellman Targets, Trust Regions, PPO Clip, KL Estimators, and LLM Token/Sequence Granularity

从 Bellman target、policy gradient、TRPO trust region、PPO clip 与 KL estimator 出发，围绕 reward shaping、advantage 粒度、importance ratio/clip 单位和 loss aggregation，拆解 PPO、GRPO、DAPO、Dr. GRPO、CISPO、GSPO、DPO 及 training-inference mismatch。

51 min read · 2026

Orthogonal Polynomials for Uncertainty Quantification: Recurrence Algorithms, PCE, and Biomedical Simulation

A research-level map of orthogonal polynomial recurrence algorithms, polynomial chaos expansion, and noninvasive uncertainty quantification for biomedical simulations.

9 min read · 2025

Contrastive Learning: Objectives, Dictionaries, Momentum Encoders, and Multimodal Alignment

从 anchor/positive/negative、dictionary 与 temperature 出发，梳理 Triplet、NCE/InfoNCE、NT-Xent、MoCo/SimCLR、BYOL/SimSiam/DINO、CLIP 以及 ArcFace/CosFace 的候选集合与表征几何。

36 min read · 2025

a distill-style blog post

an example of a distill-style blog post and main elements

25 min read · 2021

a post with code

an example of a blog post with some code

4 min read · 2015

Modern RL Objectives in Code: Bellman Targets, Trust Regions, PPO Clip, KL Estimators, and LLM Token/Sequence Granularity

从 Bellman target、policy gradient、TRPO trust region、PPO clip 与 KL estimator 出发，围绕 reward shaping、advantage 粒度、importance ratio/clip 单位和 loss aggregation，拆解 PPO、GRPO、DAPO、Dr. GRPO、CISPO、GSPO、DPO 及 training-inference mismatch。

51 min read · June 25, 2026

2026 · reinforcement-learning policy-gradient llm-rl ppo rlhf · technical-notes
From nano-vLLM to vLLM: Scheduler, Paged KV Cache, Prefill/Decode, Tensor Parallelism, and Sampling

沿请求生命周期拆解 LLMEngine、Scheduler、BlockManager、paged KV cache、prefill/decode、prefix cache、preemption、Tensor Parallel 与 sampling，理解 nano-vLLM 如何抽象 vLLM-style inference pipeline。

49 min read · April 20, 2026

2026 · vllm inference large-language-models ai-infra · ai-infra
Distributed Training for Large Models: Collectives, FSDP/ZeRO, DeviceMesh, and Multi-Dimensional Parallelism

从一次 training step 中 parameter、activation、gradient、optimizer state 的生命周期出发，梳理 Broadcast/All-Gather/Reduce-Scatter/All-Reduce/All-to-All、DDP、ZeRO/FSDP1/FSDP2、DeviceMesh、TP/PP/SP/CP/EP、process group topology 与显存-通信权衡。

48 min read · March 16, 2026

2026 · distributed-training large-language-models ai-infra deep-learning-systems · ai-infra
Modern Attention for LLMs: KV Cache, RoPE, MLA, FlashAttention, Sparse/Linear Attention, and MoE

从 Q/K/V 调用接口、MHA/MQA/GQA 与 KV cache 出发，梳理 RoPE/ALiBi/RMSNorm、MLA matrix absorption、FlashAttention online softmax、Sparse/Linear Attention、MoE routing 和 discrete gradient estimators。

29 min read · February 22, 2026

2026 · attention transformer large-language-models inference · ai-infra
Flow-Based Generative Models: From Normalizing Flows to Flow Matching, Reflow, and MeanFlow

沿着 exact likelihood、probability path、velocity regression 和 average velocity 这条主线，梳理 NF/CNF、FM/CFM、Gaussian/OT/CondOT path、Stochastic Interpolants、Rectified Flow/Reflow、MeanFlow 与 CFG。

40 min read · January 18, 2026

2026 · flow-matching normalizing-flows generative-models diffusion · ai-notes

notes

research and engineering notes on medical AI, scientific computing, safe RL, generative modeling, contrastive learning, and large-model systems

Modern RL Objectives in Code: Bellman Targets, Trust Regions, PPO Clip, KL Estimators, and LLM Token/Sequence Granularity

Orthogonal Polynomials for Uncertainty Quantification: Recurrence Algorithms, PCE, and Biomedical Simulation

Contrastive Learning: Objectives, Dictionaries, Momentum Encoders, and Multimodal Alignment

a distill-style blog post

a post with code

Modern RL Objectives in Code: Bellman Targets, Trust Regions, PPO Clip, KL Estimators, and LLM Token/Sequence Granularity

From nano-vLLM to vLLM: Scheduler, Paged KV Cache, Prefill/Decode, Tensor Parallelism, and Sampling

Distributed Training for Large Models: Collectives, FSDP/ZeRO, DeviceMesh, and Multi-Dimensional Parallelism

Modern Attention for LLMs: KV Cache, RoPE, MLA, FlashAttention, Sparse/Linear Attention, and MoE

Flow-Based Generative Models: From Normalizing Flows to Flow Matching, Reflow, and MeanFlow