Written by Ethan Smith
GitHub - ethansmith2000/clip-decomposition
https://github.com/ethansmith2000/clip-text-directions
I frequently geek out about the representational capacity of image embeddings, able to represent effectively our entire visual world in even <1000 dimensions, relatively small compared to many modern large neural networks, but still an unfathomably large space to the human mind.
With that in mind, how can we effectively explore this space and better understand it?
It’s common to use classification and clustering methods, but I’m interested in something that gives us a more intuitive feel for what this space offers us.
Borrowing from classic data science techniques, we can use Principal Component Analysis to explore the directions in which this data varies.
Following, we can then decode these embeddings with the UnCLIP models to visualize the concepts these directions cover.
Principal Component Analysis is a common data science technique to better understand the main “directions” of our data. This can allow us to discover interesting features of our data or allow for compression by isolating the most important features of our data while filtering out noise or lesser important features.
Something like Eigenfaces shows how after obtaining the set of eigenfaces, a given face could be decomposed into coefficients signaling how much of each eigenface should be present in the output