Written by Ethan Smith
experimented with a bit in a repo here:
https://github.com/ethansmith2000/cfg_as_rl
Karras recently released a paper, “**Guiding a Diffusion Model with a Bad Version of Itself”** dropping yet more wealth of knowledge on diffusion. Go read it, if you haven’t yet, there’s some great insights.
Unaware of the complexities explaining the benefits, I had tried something similar a while back where you could perform classifier-free-guidance (CFG) using predictions from a finetuned model as well as its corresponding base model to amplify the effects of the finetune sort of.
Now armed with a better understanding of the mechanism underlying the benefits, it occurred to me that the classifier-free-guidance formula shares some similarities with the formulas we use for RL-objectives.
Let’s take a look at what’s going on here.
Above we have the reinforcement learning objective (screenshot from DPO paper here) and below we have the classifier free guidance equation (screenshot from here)
Here $y$ maps to $z_{\lambda}$ which is the sample (i.e. the image) and
$x$ maps to $c$ which is the condition (i.e. text prompt)