Running AI models locally for creativity.

I’ve made a point of staying on top of the news and advancements in AI models. Since trying Andrew Ng’s machine learning Coursera class in 2017, I’ve been excited for AI models that are ready for more people to use.

Obviously a lot has changed very quickly! I’ve been particularly excited about the possibilities with open-source models, lagging in capabilities just a few months behind closed-source. Since they’re open, the implementation possibilities are much broader, and since you can run the models yourself, there is no chance of your efforts in prompt engineering or your results being vacuumed up for future training or other purposes (e.g. getting leaked at some point).

Personally, I’ve been interested in how I can use AI to bolster my own creativity. I think that, overall, this is going to be a huge application for AI models going forward, as the relationship between people and work evolves with and because of AI.


To try some serious local linear algebra, I upgraded my desktop to an RTX 3070ti and 128 GB of RAM. It would be great if VRAM could be upgraded on graphics cards, but I digress.

Since, I’ve tried several different models out locally. I’ve spent quite a bit of time learning prompting strategies for using Stable Diffusion XL to create interesting artistic outputs. As someone without much raw artistic skill but with Photoshop chops, having SDXL running locally has been a boon for my creativity. Now that there is a web UI available for SDXL and dealing with a Python script is no longer a requirement, doubly so.

Above you can see, for example, how I put together my current profile picture. I composited my image in with noise in the rough layout I wanted, then used image-to-image generation with a text prompt to get the landscape (and stand-in person) I was looking for. From there, it was just compositing and tweaking in Photoshop, with some help from Adobe Firefly (not open unfortunately, but the integration with Photoshop and inpainting capabilities are currently more reliable than SDXL - well done Adobe!) and finally upscaling.


Image-to-image generation with SDXL is very dependent on spatial positioning of colors in the source image, hence my above approach creating the noise composite first. Below is an earlier example, a visual transformation of my dog Ella. To get this result without editing the source image, I had to change my prompt such that I was giving the model something to do with the blotches of color it didn’t quite understand, in this case turning the blanket into water, and her foreground body portion and the light wall in the background into large white flowers.


I’ve spent quite a bit of time learning the best approaches to prompting ChatGPT, especially GPT-4, for getting help and inspiration for lots of different things, for example coding, tree fungus preventatives, complicated sentence structuring, and understanding medication effects and interactions.

Among the open language models, Mixtral 8x7b has been the highest-quality model I’ve tried with the lowest overhead for running locally. Unfortunately, one token every second or two has been the performance I’ve gotten, which is a bit below the threshold of usability. I’ve been able to run it and other models on cloud infrastructure instead, so I’ll likely be taking that approach in my personal projects for the time being. Hugging Face is great for this, and I think their platform could become much larger as more people seek their own uses for AI - just with a few more user experience tweaks :)

Previous
Previous

Augmenting human-centered design research with AI.

Next
Next

Expanding a vehicle program’s experience design and prototyping.