Intro

experimented with a bit in a repo here:

https://github.com/ethansmith2000/cfg_as_rl

Karras recently released a paper, “**Guiding a Diffusion Model with a Bad Version of Itself”** dropping yet more wealth of knowledge on diffusion. Go read it, if you haven’t yet, there’s some great insights.

Unaware of the complexities explaining the benefits, I had tried something similar a while back where you could perform classifier-free-guidance (CFG) using predictions from a finetuned model as well as its corresponding base model to amplify the effects of the finetune sort of.

Now armed with a better understanding of the mechanism underlying the benefits, it occurred to me that the classifier-free-guidance formula shares some similarities with the formulas we use for RL-objectives.

Let’s take a look at what’s going on here.

Screenshot 2024-06-18 at 3.48.07 AM.png

Above we have the reinforcement learning objective (screenshot from DPO paper here) and below we have the classifier free guidance equation (screenshot from here)

Here $y$ maps to $z_{\lambda}$ which is the sample (i.e. the image) and

$x$ maps to $c$ which is the condition (i.e. text prompt)

Table of Contents

Intro