Object detection is a computer vision solution that identifies objects, and their locations, in an image.
Here's how to set up for the object detection job:
- Start by adding the image files.
- Then, tap the
rectangle
button on the left menu or press theR
key to quickly create a rectangle shape. - Finally, type in the matching name in the label dialog.
- Import your image (
Ctrl+I
orCtrl+U
) or video (Ctrl+O
) file into the X-AnyLabeling. - Select and load the
YOLO11
model, or choose from other available object detection models. - Initiate the process by clicking
Run (i)
. Once you've verified that everything is set up correctly, use the keyboard shortcutCtrl+M
to process all images in one go.
Let's take the Universal Proposal Network (UPN) model as an example to demonstrate advanced usage, which adopts a dual-granularity prompt tuning strategy to generate comprehensive proposals for objects at both instance and part levels:
fine_grained_prompt
: For detecting detailed object parts and subtle differences between similar objects. This mode excels at identifying specific features like facial characteristics or distinguishing between similar species.coarse_grained_prompt
: For detecting broad object categories and major scene elements. This mode focuses on identifying general objects like people, vehicles, or buildings without detailed sub-categorization.
Before you begin, make sure you have the following prerequisites installed:
Step 0: Download and install Miniconda from the official website.
Step 1: Create a new Conda environment with Python version 3.9
or higher, and activate it:
conda create -n x-anylabeling-upn python=3.9 -y
conda activate x-anylabeling-upn
You'll need to install Pytorch first. Follow the instructions here to install related dependencies.
Afterward, you can install ChatRex on a GPU-enabled machine using:
git clone https://github.com/IDEA-Research/ChatRex.git
cd ChatRex
pip install -v -e .
# install deformable attention for universal proposal network
cd chatrex/upn/ops
pip install -v -e .
# Back to the project root directory
cd -
Finally, install the necessary dependencies for X-AnyLabeling (v2.5.0+):
cd ..
git clone https://github.com/CVHub520/X-AnyLabeling
cd X-AnyLabeling
Now, you can back to the installation guide (简体中文 | English) to install the remaining dependencies.
Here's how to set up for the UPN job:
- Import your image (
Ctrl+I
) or video (Ctrl+O
) file into X-AnyLabeling - Select and load the
Universal Proposal Network (IDEA)
model from the model list - Click
Run (i)
to start processing. After verifying the results are satisfactory, useCtrl+M
to batch process all images
Additionally, you can adjust the following parameters to filter detection results directly from the GUI:
- Detection Mode: Switch between
Coarse Grained
andFine Grained
modes using the dropdown menu next to the model selection - Confidence Threshold: Adjust the confidence score (0-1) using the "Confidence" spinner control
- IoU Threshold: Control the Non-Maximum Suppression (NMS) threshold (0-1) using the "IoU" spinner control
The OpenVision model demonstrates advanced usage with dual-granularity prompt tuning for comprehensive object detection at both instance and part levels.
Before starting, please install the required CountGD dependencies.
For a demonstration of the workflow, watch the demo video below:
X-AnyLabeling supports three different prompting modes for object detection and annotation:
-
Point Prompting Mode:
- Uses the Segment Anything Model (SAM) to generate high-precision segmentation masks
- Activated by clicking points on the target object
- Best for detailed object segmentation and boundary detection
- Ideal for irregular shapes and precise annotations
-
Rectangle Prompting Mode:
- Leverages the CountGD model to detect visually similar objects
- Activated by drawing a bounding box around an example object
- Automatically finds and annotates similar objects in the image
- Optimal for batch detection of multiple similar objects
-
Text Prompting Mode:
- Powered by Grounding DINO for text-guided object detection
- Activated by entering natural language descriptions
- Locates objects based on textual descriptions
- Perfect for finding objects by their semantic description
Note
Please note that the current model is experimental and may not perform as expected, and it may have the following limitations:
- The model weights are trained on FSC-147, which may not perform well on out-of-distribution objects.
- The model inference is very resource-intensive, as it is designed as a two-stage pipeline.
- The current model is not able to effectively distinguish between similar objects, which may lead to some false positives.