Examples below!
Utilizes llama-cpp-python
for integration of LLaVa models. You can load and use any llm with LLaVa models in GGUF format with this nodes.
You need to download the clip projector mmproj-model-f16.gguf
from this repositories. python=>3.9
is necessary. Put all of the files inside models/LLavacheckpoints
This node is designed to work with the Moondream model, a powerful small vision language model built by @vikhyatk using SigLIP, Phi-1.5, and the LLaVa training dataset. The model boasts 1.6 billion parameters and is made available for research purposes only; commercial use is not allowed.
This node is designed to transform textual descriptions into automatically generated image generation prompts. It simplifies the process of creating vivid and detailed prompts for image generation. Optionally you can chat with llms using SimpleChat Node.
You can use:
- ChatGPT-4
- ChatGPT-3.5
- DeepSeek
https://platform.deepseek.com/ gives 10m free tokens.
@fpgamine's JoyTag is a state of the art AI vision model for tagging images, with a focus on sex positivity and inclusivity.
It uses the Danbooru tagging schema, but works across a wide range of images, from hand drawn to photographic.
cd custom_nodes
git clone https://github.com/gokayfem/ComfyUI_VLM_nodes.git