A repository for hacking Generative Fill with Open Source Tools
Creating an Open Source alternative for GenerativeFill and other editing tools.
- Provide an Edit Prompt
- Provide an Image
- Edit the Image based on the initial Prompt
- Accept an
edit_prompt
and animage
as input - Use a Vision Model to
caption
the image - Pass the
edit_prompt
through a language model to extract the source entity - Create a
replacement_caption
where the source entity of the original image is swapped with the target entity in theedit_prompt
- Use the source entity to create a segmentation mask using OWL-VIT and SAM
- Use the mask and the
replacement_caption
for image inpainting
The pipeline is shown in the figure below:
This repository is still in its early stages and will require additional work.
- Better Captioning from Vision Model
- Prompt upsampling using the Language Model
- More complex editing tasks than replacement
- Optimization of the models and an end-to-end pipeline
- sayakpaul for the amazing advice and ideas
- pedrogengo for the replacement caption idea, as illustrated here
- rishiraj for patiently teaching us about Qwen and small LLMs
If Generative Fill helps your research, we appreciate your citations. Here is the BibTeX entry:
@misc{raha2024opengenerativefill,
title={Open Source Generative Fill},
author={Raha, Ritwik and Roy Gosthipaty, Aritra},
year={2024},
howpublished={\url{https://github.com/ritwikraha/GenerativeFill-with-Keras-and-Diffusers}},
}