Written by Ethan Smith

Table of Contents

Intro and review of failure cases

While SDXL has greatly improved adherence to prompts, there are still several notable places where things fall short. Some things that I think are somewhat inherent to using clip as the text condition.

Here’s a couple of failure cases I’ll roughly formalize:

Misattribution

You’ve probably seen it happen. You ask for “A cat with green eyes resting upon a pair of shoes” and the green infiltrates into other parts of the scene.

SDXL_09_A_cat_with_piercing_green_eyes_resting_upon_a_pair_of_1.webp

Neglect (Competition/Dominance):

“Photo of a red sphere on top of a blue cube. Behind them is a green triangle, on the right is a dog, on the left is a cat”

“Photo of a red sphere on top of a blue cube. Behind them is a green triangle, on the right is a dog, on the left is a cat”