Skip to content

Latest commit

 

History

History
149 lines (93 loc) · 5.99 KB

File metadata and controls

149 lines (93 loc) · 5.99 KB

Interactive Video Object Segmentation Example

Introduction

Interactive Video Object Segmentation (iVOS) has become an essential task for efficiently obtaining object segmentations in videos, often guided by user inputs like scribbles, clicks, or bounding boxes. In this tutorial, you'll learn how to leverage the video tracking feature of SAM2 on X-AnyLabeling to accomplish iVOS tasks.

Let's get started!

Installation

Before you begin, make sure you have the following prerequisites installed:

Step 0: Download and install Miniconda from the official website.

Step 1: Create a new Conda environment with Python version 3.10 or higher, and activate it:

conda create -n x-anylabeling-sam2 python=3.10 -y
conda activate x-anylabeling-sam2

You'll need to install SAM2 first. The code requires torch>=2.3.1 and torchvision>=0.18.1. Follow the instructions here to install both PyTorch and TorchVision dependencies.

Afterward, you can install SAM2 on a GPU-enabled machine using:

git clone https://github.com/CVHub520/segment-anything-2
cd segment-anything-2
pip install -e .

Finally, install the necessary dependencies for X-AnyLabeling (v2.4.2+):

cd ..
git clone https://github.com/CVHub520/X-AnyLabeling
cd X-AnyLabeling

# For Windows or Linux
pip install -r requirements.txt

# For macOS
pip install -r requirements-macos.txt
conda install -c conda-forge pyqt=5.15.9

Getting Started

Prerequisites

Step 0: Launch the app:

python3 anylabeling/app.py

Step 1: Load the SAM 2 Video model

Load-Model

Note: If the model fails to load due to network issues, please refer to the following settings.

First, you'll need to download a model checkpoint. For this tutorial, we'll use the sam2_hiera_large.pt checkpoint as an example.

After downloading, place the checkpoint file in the corresponding model folder within your user directory (create the folder if it doesn't exist):

# Windows
C:\Users\${User}\xanylabeling_data\models\sam2_hiera_large_video-r20240901

# Linux or macOS
~/xanylabeling_data/models/sam2_hiera_large_video-r20240901

Additionally, if you want to use other sizes of SAM2 models or modify the model loading path, refer to this documentation for custom settings: 简体中文 | English.

Step 2: Add a video file (Ctrl + O) or a folder of split video frames (Ctrl + U).

Note

As of now, the supported file formats are limited to [*.jpg, *.jpeg, *.JPG, *.JPEG]. When loading video files, they will be automatically converted to jpg format by default.

Usage

Step 0: Add Prompts

add_prompts.mp4

Tip

  • Point (q): Add a positive point.
  • Point (e): Add a negative point.
  • +Rect: Draw a rectangle around the object.
  • Clear (b): Erase all added marks.
  • Finish Object (f): Confirm the object.

For the initial frame, you can add prompts such as positive points, negative points, and rectangles (Marks) to guide the tracking of the desired object. Follow these steps:

  1. If the segmentation result meets your expectations, click the Finish Object (f) button at the top of the screen or press the f key to confirm the object. If not, click the Clear (b) button or press the b key to quickly clear any invalid marks.
  2. We strongly recommend assigning labels like object0, object1, ..., objectN to each added target sequentially.

Warning

If you need to delete a confirmed object, follow these steps:
a. Open the edit mode (Ctrl + J) and remove all added objects from the current frame;
b. Click the Reset Tracker button at the top of the screen to reset the tracker;
c. Reapply the prompts (Marks) as described above.

rectangle_tracklet

Alternatively, if you only want to set up object detection tracking, you simply need to filter the output mode to Rectangle.

Step 1: Propagate the prompts to get the tracklet across the video

run_video

Once you've finished setting the prompts, you can start the video tracking by either clicking the video start button on the left-hand menu or using the shortcut Ctrl+M to get the tracklet throughout the entire video.

Step 2: Add New Prompts to Further Refine the tracklet

After tracking the entire video, if you notice any of the following issues in the middle frames:

  • Target is lost
  • Imperfections in boundary details
  • New objects need to be tracked

You can treat the current frame as the starting frame and follow these steps:

a. Open the edit mode (Ctrl + J) and remove all added objects from the current frame.
b. Click the Reset Tracker button at the top of the screen to reset the tracker.
c. Reapply the prompts (Marks) as described earlier.

Then, repeat the steps in Step 0 and Step 1.

rename

Upon completion of all tasks, you can access the Tool -> Label Manager option from the top menu to assign specific class names.

Note

Just a reminder to click the Reset Tracker button at the top of the screen after uploading a new video file to reset the tracker.


Congratulations! 🎉 You’ve now mastered the basics of X-AnyLabeling. Feel free to experiment with it on your own videos and various use cases!