ComfyUI - private and locally hosted AI image generation

If you followed my blog series so far you’ve everything setup to install ComfyUI which I use for local AI image generation without the need for any external services. There are a lot more tools out there to do this and some are also a little bit easier to use. But ComfyUI is really quite powerful and one can integrate it into Open WebUI to generate images completely locally without using external AI tools like DALL-E from OpenAI. Maximum privacy and flexibility 😄

With the previous blog posts all prerequisites already installed to start with ComfyUI. But here is a list of things that need to be installed to be able to work with ComfyUI:

As of writing this blog post Python 3.12 is recommended to be used with ComfyUI. As usual lets install a Python virtual environment with Python 3.12 using uv utility (again --seed also installs tools like pip e.g.):

bash

uv venv --python 3.12 --seed comfyui

Change to newly created directory:

bash

cd comfyui

direnv is a little utility that can run various commands when you enter a directory. So instead of running source bin/activate all the time to activate the ComfyUI venv, I’ve added a file called .envrc into the comfyui directory with the following content:

bash

source ./bin/activate

Next install direnv:

bash

sudo apt install direnv

Additionally there needs to be a hook added to $HOME/.bashrc to make direnv work:

bash

eval "$(direnv hook bash)"

To get it enabled immediately:

bash

source $HOME/.bashrc

The next time you enter /home/your-user/comfyui directory you get an error as you need to allow direnv to load the .envrc file in this directory. E.g.:

bash

direnv allow .

If you now leave the directory and re-enter again the commands in .envrc will be executed. In this case it means that the Python venv gets activated and python and pip executables of that venv will be used.

While still in $HOME/comfyui directory clone the ComfyUI Git repository. E.g.:

bash

git clone https://github.com/comfyanonymous/ComfyUI

Then switch to cloned ComfyUI directory:

bash

cd $HOME/comfyui/ComfyUI

Next PyTorch is needed. Three Python libraries needed for this:

bash

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu124

This will take some time depending on your Internet connection speed as quite some big files will be downloaded. If that is done further Python libraries need to be downloaded:

bash

pip install -r requirements.txt

Now with everything installed ComfyUI can be started:

bash

python main.py --listen

ComfyUI listens on port 8188 by default. So if you have everything running locally you should now be able to enter http://127.0.0.1:8188 in your browser URL bar and the ComfyUI UI should become visible.

By default there is already a workflow loaded. But since there are no models and whatsoever installed, not much can be done with that default workflow.

One of the most important things needed is a large language model (LLM) which is able to “transform” the text input into an image. One of the most important sites to download models is Civitai. And one of the models to start with is JuggernautXL. This model normally creates good images right from the start without too much tuning. And it needs about 7 GByte of VRAM. So even with a graphics card with around 8 GByte of video RAM this model might still fit into memory. For my NVIDIA RTX 4080 SUPER with 16 GByte of VRAM it’s definitely no issue. But remember that the model needs to be loaded into VRAM before the very first image can be created. This can take a bit to transfer around 7 GByte of data from the hard disk to the graphic cards VRAM.

You can also find other SDXL (Stable Diffusion XL) models on Civitai. Stable Diffusion is a type of generative model that uses diffusion processes to create images, and it is widely used in the AI community for various applications such as art generation, design, and more. SDXL is a more advanced version of former iterations of Stable Diffusion models. To find such models goto Civitai, click Models (1) and then on Filters (2) on the top right:

comfyui-civitai-filter

Then click All Time (3) for the Time period and Checkpoint (4) for Model types. In Base mode click on SDXL 1.0 (5). As mentioned above already for now select JuggernautXL model. We need to download that model:

comfyui-civitai-filter

Click on the download button (1). The file is called juggernautXL_juggXIByRundiffusion.safetensors. This file needs to be stored in comfyui/ComfyUI/models/checkpoints folder. As mentioned above this file is around 7 GByte. So depending on your Internet connection this might take a bit.

Once the model was downloaded ComfyUI needs to be restarted. As I started ComfyUI via a shell command above I go back to the terminal and press CTRL+C to stop the Python process. Then start it again:

bash

python main.py --listen

Then lets go back to the browser to http://127.0.0.1:8188/ and reload the website. With the default workflow still loaded I can now choose the juggernautXL_juggXIByRundiffusion checkpoint in Load Checkpoint (1) node:

comfyui-first-workflow-and-first-image

There should be also some text prefilled in CLIP Text Encode (Prompt) (2) which I just keep as is. This text node contains the text that describes what picture the model should generate. The node below which is also called CLIP Text Encode (Prompt) is the so called “negative prompt”. Here you can include text about things and objects you don’t want to have in the resulting image. Hint: For the Juggernaut XL models normally no negative prompt should be used at the very beginning.

I wont get too much into the details of a workflow right now to have some quick results. I’ll get into more details with another blog post. So without further explanation about the whole workflow just click Queue button on the bottom. If your graphic card fans are now starting to make noise you’re doing everything right 😄 After a few seconds you should see a similar image (3) like the one in the screenshot above. Congratulations! You just created your first AI image 😉

Now I wanna know how much VRAM the model needs. While the image is generated I can execute nvidia-smi command in a terminal window:

plain

Mon Jan  6 19:49:42 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120                Driver Version: 550.120        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080 ...    Off |   00000000:06:00.0 Off |                  N/A |
|  0%   42C    P2             43W /  320W |    6995MiB /  16376MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A   2696161      C   python                                       6988MiB |
+-----------------------------------------------------------------------------------------+

As you can see the model needs around 7 GByte VRAM (6995 MiB). That’s pretty much the same size it uses on disk. That also means that I’d be able to load another model with around 8-9 GByte. That becomes important later when I want to combine ComfyUI and Open WebUI because for ComfyUI I need a different model to generate images while Open WebUI needs a model loaded to generate text. To get max. speed I need both models loaded into VRAM at the same time if I want to generate images while working with Open WebUI.

Here is an example that has Juggernaut XL XI model loaded for ComfyUI and Llama 3.1 8b for Open WebUI`:

plain

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120                Driver Version: 550.120        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080 ...    Off |   00000000:06:00.0 Off |                  N/A |
| 34%   41C    P8             13W /  320W |   13342MiB /  16376MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A   1770578      C   ...unners/cuda_v12/ollama_llama_server       6310MiB |
|    0   N/A  N/A   3510564      C   /home/ansible/comfyui/bin/python             7022MiB |
+-----------------------------------------------------------------------------------------+

As you can see above around 13 GByte out of 16 GByte VRAM are now used when the models mentioned above are loaded.

There are quite a few extensions, additional nodes, and so on that makes life easier with ComfyUI. So before getting a bit more into detail what the components of the workflow used above is all about, lets extend ComfyUI a bit.

ComfyUI-Manager is a must-have. The “Restart” button alone is worth it because that’s something you need to do with ComfyUI pretty often at the beginning when installing new stuff 😉 But besides that it’s an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable, and enable various custom nodes of ComfyUI. Furthermore, this extension provides a hub feature and convenience functions to access a wide range of information within ComfyUI. It also allows to download models.

To install ComfyUI-Manager:

  1. Switch to $HOME/comfyui/ComfyUI/custom_nodes directory
  2. Execute git clone https://github.com/ltdrdata/ComfyUI-Manager comfyui-manager
  3. Restart ComfyUI (as above with CTRL+C in the terminal and starting it again with python main.py --listen)
  4. Back in the browser hit SHIFT + Reload button to reload the ComfyUI page.

Now you should see the Manager button (1) at the top right. Click on that one and you should see something similar to that:

comfyui-manager-overview

Now every time if a restart is needed (e.g. if a new model got installed), just open the ComfyUI-Manager and press the Restart button.

The default workflow I used above is nice but there are better ones out there. One site that offers ComfyUI Workflows is OpenArt. Lets search for Juggernaut workflows. Lets choose the Workflow 1 (Readhead - Juggernaut XI) workflow. While the preview picture of that workflow shows a womans head with red hair, this workflow can do way more without needing a lot more additional resources. So lets download that workflow by clicking the Download button on the top right of the workflow webpage. Store the file workflow-workflow-1-readhead---juggernaut-xi-yHLfPilJlBaGxPlHOluD-seal_harmful_40-openart.ai.json somewhere. Now switch to ComfyUI tab in your browser to load the workflow:

comfyui-readhead-juggernaut-workflow

Click on Workflow (1) in the top left, then Open (2) and select the workflow file just downloaded. If you now click on the Queue button to generate an image you’ll get an error stating that sdxl_vae.safetensors is missing. Those .safetensors files can be easily found by just pasting the file name into Google search. In nearly all cases you’ll end up at Hugging Face which means stabilityai/sdxl-vae in this case. There you can download that file. It’s about 335 MByte. In this case the sdxl_vae.safetensors file has to be stored in $HOME/comfyui/ComfyUI/models/vae directory. So put it there and use ComfyUI-Manger to restart ComfyUI. If you now click Queue button again it will take a few seconds and now the image generation should work.

With that basic workflow you can do already quite a lot and the images are looking quite impressive even without much effort. To create good prompts for good looking images using Juggernaut XI or XII model please read Prompt Guide for Juggernaut XI and XII by RunDiffusion.

Also if you have a look at the Juggernaut XL - Jugg_XI_by_RunDiffusion model page at Civitai and scroll down a bit you see a lot of images created with that model. Quite often if you click on an image you get the prompts and parameters used by the image creator. These prompts and parameters are also good starting points for creating your own images at the beginning. Sometimes it can be really frustrating. You think you have a good positive and negative prompt but the result looks not very good at the end. Sometimes it’s just a single keyword that destroys your image. So by looking at the example pictures and parameters used you can get a “feeling” what works and what not over time.

For the workflow above I used the prompt of this image. On the right side of the image page there you can see what prompts were used and also some parameters. So for the current workflow I used basically the same settings:

Positive prompt (CLIP Text Encode (Prompt) - the one on the top of the page):

text

moody aesthetic, beautiful cozy, cramped bedroom with floor
to ceiling glass windows overlooking a cyberpunk city at night,
view from top of skyscraper, white bedsheets, bookshelves,
thunderstorm outside with torrential rain, detailed,
high resolution, photorrealistic, dark, gloomy

Negative prompt (CLIP Text Encode (Prompt)):

text

bad eyes, cgi, airbrushed, plastic, deformed, watermark

For KSampler the following parameter were used:

  • Steps: 30
  • Sampler: dpmpp_2m_sde_gpu

In Empty Latent Image node you can specify the width and the height of the generated images. According to the Juggernaut prompt guide mentioned above the best image sizes are: 1024x1024, 832x1216 and 1216x832. So if you get bad results that might be an option to try different image sizes. You can also specify batch_size. This tells the generator how many images should be created. With 4 you get four different images that all are aligned to the text prompt. 4 is a good number to start with but it also takes longer to generate the images with a higher count here of course. But esp. when creating images with humans only one or two of the generated images are somewhat useable - at least that was the case for me but with a very good prompt even four images might look good 😉. Also if you run out of VRAM reducing the amount of pictures created might help.

With the parameters and prompts above I get images like this:

comfyui-sample-image-01

Not bad for an AI noob 😄

In one of my previous blog posts I asked the Llama 3.2-vision:11b model to describe what’s on a book cover I provided. The answer was like this:

The picture features a number of blue spheres floating above water or another reflective surface, with some spheres casting shadows onto the surface below.

The largest sphere is centered and sits atop what appears to be a dark reflection that extends outward from its base, resembling a shadow. It reflects light in various directions, creating different shades of blue across its surface.

Surrounding this are four smaller spheres, arranged in a square formation with one in each corner. Each one casts a shadow onto the reflective surface below and also reflects light in various directions.

The image is framed by an off-white border that adds some contrast to the rest of the picture. The background color is a medium blue.

At that time I used that prompt to ask DALL-E from OpenAI to generate an image out of that description. That result was quite good but every image generation with DALL-E costs money of course (but actually it’s not that much if you don’t create thousand of images…). So what image will my basic ComfyUI workflow with Juggernaut XL model will create? Well, it was somewhat ok’ish:

comfyui-spheres

That’s it for today. Next time I’ll get a bit deeper into ComfyUI workflows and try to use the FLUX.1-dev model that everybody is talking about since it was introduced a few month ago. While the original one hardly fits into the VRAM of a RTX 4090 with 24 GByte VRAM there are some quantized models that can be used with GPUs with less VRAM.