Written by Ethan Smith

Table of Contents

https://github.com/ethansmith2000/StableTwoUnet

Intro


The dynamics of classifier free guidance is super interesting to me. It is described as taking the difference between two a conditional predictions or distributions and an unconditional, scaling this difference by a factor and then adding it to the unconditional one to “take a step” in the direction of stronger conditioning.

Sander Dielman has a great blog post on how it works

https://sander.ai/2022/05/26/guidance.html

People later discovered this difference could be taken between many other kinds of predictions such as between a prediction conditioned on negative prompt (the things you don’t want in the resulting image) in order to find a direction that steps away from this concept.

Here I asked a question, “What would happen if we took the difference between two different models?” Specifically, between a worse or base model and a corresponding fine-tuned version. I suspected this could give us a more profound direction than two prediction from the same model, and sort of increase the effect of the learned styles of the fine-tuned model.

Additionally, I tried some other things like adding noise or blurring/perturbing the negative prediction in hopes that it would step away from poorer predictions. Although I’m not as sure how beneficial this is.

Results


compare.png

On the left, our CFG is done by taking the difference of predictions from the same prompt for DreamshaperV7 and SD1.5.

While on the right we have DreamshaperV7 using a positive and negative prompt as typical