Object Detection Example

Introduction

Object detection is a computer vision solution that identifies objects, and their locations, in an image.

Basic Usage

Here's how to set up for the object detection job:

Start by adding the image files.
Then, tap the rectangle button on the left menu or press the R key to quickly create a rectangle shape.
Finally, type in the matching name in the label dialog.

Advanced Usage

Object Detection

Import your image (Ctrl+I or Ctrl+U) or video (Ctrl+O) file into the X-AnyLabeling.
Select and load the YOLO11 model, or choose from other available object detection models.
Initiate the process by clicking Run (i). Once you've verified that everything is set up correctly, use the keyboard shortcut Ctrl+M to process all images in one go.

Region Proposal

Let's take the Universal Proposal Network (UPN) model as an example to demonstrate advanced usage, which adopts a dual-granularity prompt tuning strategy to generate comprehensive proposals for objects at both instance and part levels:

fine_grained_prompt: For detecting detailed object parts and subtle differences between similar objects. This mode excels at identifying specific features like facial characteristics or distinguishing between similar species.
coarse_grained_prompt: For detecting broad object categories and major scene elements. This mode focuses on identifying general objects like people, vehicles, or buildings without detailed sub-categorization.

Before you begin, make sure you have the following prerequisites installed:

Step 0: Download and install Miniconda from the official website.

Step 1: Create a new Conda environment with Python version 3.9 or higher, and activate it:

conda create -n x-anylabeling-upn python=3.9 -y
conda activate x-anylabeling-upn

You'll need to install Pytorch first. Follow the instructions here to install related dependencies.

Afterward, you can install ChatRex on a GPU-enabled machine using:

git clone https://github.com/IDEA-Research/ChatRex.git
cd ChatRex
pip install -v -e .
# install deformable attention for universal proposal network
cd chatrex/upn/ops
pip install -v -e .
# Back to the project root directory
cd -

Finally, install the necessary dependencies for X-AnyLabeling (v2.5.0+):

cd ..
git clone https://github.com/CVHub520/X-AnyLabeling
cd X-AnyLabeling

Now, you can back to the installation guide (简体中文 | English) to install the remaining dependencies.

Here's how to set up for the UPN job:

Import your image (Ctrl+I) or video (Ctrl+O) file into X-AnyLabeling
Select and load the Universal Proposal Network (IDEA) model from the model list
Click Run (i) to start processing. After verifying the results are satisfactory, use Ctrl+M to batch process all images

Additionally, you can adjust the following parameters to filter detection results directly from the GUI:

Detection Mode: Switch between Coarse Grained and Fine Grained modes using the dropdown menu next to the model selection
Confidence Threshold: Adjust the confidence score (0-1) using the "Confidence" spinner control
IoU Threshold: Control the Non-Maximum Suppression (NMS) threshold (0-1) using the "IoU" spinner control

Text-Visual Prompting Grounding

The OpenVision model demonstrates advanced usage with dual-granularity prompt tuning for comprehensive object detection at both instance and part levels.

Before starting, please install the required CountGD dependencies.

For a demonstration of the workflow, watch the demo video below:

X-AnyLabeling supports three different prompting modes for object detection and annotation:

Point Prompting Mode:
- Uses the Segment Anything Model (SAM) to generate high-precision segmentation masks
- Activated by clicking points on the target object
- Best for detailed object segmentation and boundary detection
- Ideal for irregular shapes and precise annotations
Rectangle Prompting Mode:
- Leverages the CountGD model to detect visually similar objects
- Activated by drawing a bounding box around an example object
- Automatically finds and annotates similar objects in the image
- Optimal for batch detection of multiple similar objects
Text Prompting Mode:
- Powered by Grounding DINO for text-guided object detection
- Activated by entering natural language descriptions
- Locates objects based on textual descriptions
- Perfect for finding objects by their semantic description

Note

Please note that the current model is experimental and may not perform as expected, and it may have the following limitations:

The model weights are trained on FSC-147, which may not perform well on out-of-distribution objects.
The model inference is very resource-intensive, as it is designed as a two-stage pipeline.
The current model is not able to effectively distinguish between similar objects, which may lead to some false positives.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Object Detection Example

Introduction

Basic Usage

Advanced Usage

Object Detection

Region Proposal

Text-Visual Prompting Grounding

Files

README.md

Latest commit

History

README.md

File metadata and controls

Object Detection Example

Introduction

Basic Usage

Advanced Usage

Object Detection

Region Proposal

Text-Visual Prompting Grounding