In this notebook, we provide the OpenVINO™ optimization for the combination of GroundingDINO + SAM = GroundedSAM on Intel® platforms.
GroundedSAM aims to detect and segment anything with text inputs. GroundingDINO is a language-guided query selection module to enhance object detection using input text. It selects relevant features from image and text inputs and returns predicted boxes with detections. The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. We use box predictions from GroundingDINO to mask the original image.
More details about the model can be found in the paper, and the official repository.
In this tutorial, we will explore how to convert and run GroundedSAM using OpenVINO.
- Download checkpoints and load PyTorch model
- Convert GroundingDINO to OpenVINO IR format
- Run OpenVINO optimized GroundingDINO
- Convert SAM to OpenVINO IR
- Combine GroundingDINO + SAM (GroundedSAM)
- Interactive GroundedSAM
This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to Installation Guide.