Implicit Regularization in RL Post-Training March 2, 2026 Why RL generalizes where SFT memorizes: implicit regularization through on-policy optimization, forward vs. reverse KL divergence, and RL's Razor.