Written by Ethan Smith
The repo:
https://github.com/ethansmith2000/AutoLoRADiscovery
Something that has often interested me in ML is figuring out ways we can constrain our search space of parameters.
When we train neural networks, we typically have pretty close to uninformative priors over our model weights, opting for normal/uniform initializations. There are some heuristics over what ranges of values tend to be friendly, but there isn’t typically any data-specific choices here. Once observing results, the Bayesian framework suggests we should update our priors.
Thus, despite the fact that our discovered solutions training large models over the same datasets tend to exhibit some common traits (i.e. steady increasing norms, magnitude of residual connections) that we could take note of to constrain where we look or have better informed initializations, we don’t make much use of it.
Granted, this isn’t necessarily something easy to exploit. It might sound nice, but its not necessarily practical. The distribution of discovered solutions may be many sparse peaks across the landscape which may not give us a much better choice than to aim for a initialization that covers everything, lottery hypothesis style.
One place I think this could be practical though is with LoRAs (or other few parameter adapter methods)
This in itself is an act of constraining our search space and part of why we can do these parameter efficient training methods.
But as an experiment, we can likely go further.