Skip to content

jxhoh/sam.cpp

 
 

Repository files navigation

SAM.cpp

Inference of Meta's Segment Anything Model in pure C/C++

demo-0.mp4

Quick start

git clone --recursive https://github.com/YavorGIvanov/sam.cpp
cd sam.cpp

Note: you need to download the model checkpoint below (sam_vit_b_01ec64.pth) first from here and place it in the checkpoints folder

# Convert PTH model to ggml. Requires python3, torch and numpy
python convert-pth-to-ggml.py checkpoints/sam_vit_b_01ec64.pth . 1

# You need CMake and SDL2
SDL2 - Used for GUI windows & input [libsdl](https://www.libsdl.org)

[Ubuntu]
$ sudo apt install libsdl2-dev

[Mac OS with brew]
$ brew install sdl2

[MSYS2]
$ pacman -S git cmake make mingw-w64-x86_64-dlfcn mingw-w64-x86_64-gcc mingw-w64-x86_64-SDL2

# Build sam.cpp.
mkdir build && cd build
cmake .. && make -j4

# run inference
./bin/sam -t 16 -i ../img.jpg -m ../checkpoints/ggml-model-f16.bin

Note: The optimal threads parameter ("-t") value should be manually selected based on the specific machine running the inference.

Downloading and converting the model checkpoints

You can download a model checkpoint and convert it to ggml format using the script convert-pth-to-ggml.py:

# Convert PTH model to ggml
python convert-pth-to-ggml.py sam_vit_b_01ec64.pth . 1

Example output on M2 Ultra

 $ ▶ make -j sam && time ./bin/sam -t 8 -i img.jpg
[ 28%] Built target common
[ 71%] Built target ggml
[100%] Built target sam
main: seed = 1693224265
main: loaded image 'img.jpg' (680 x 453)
sam_image_preprocess: scale = 0.664062
main: preprocessed image (1024 x 1024)
sam_model_load: loading model from 'models/sam-vit-b/ggml-model-f16.bin' - please wait ...
sam_model_load: n_enc_state      = 768
sam_model_load: n_enc_layer      = 12
sam_model_load: n_enc_head       = 12
sam_model_load: n_enc_out_chans  = 256
sam_model_load: n_pt_embd        = 4
sam_model_load: ftype            = 1
sam_model_load: qntvr            = 0
operator(): ggml ctx size = 202.32 MB
sam_model_load: ...................................... done
sam_model_load: model size =   185.05 MB / num tensors = 304
embd_img
dims: 64 64 256 1 f32
First & Last 10 elements:
-0.05117 -0.06408 -0.07154 -0.06991 -0.07212 -0.07690 -0.07508 -0.07281 -0.07383 -0.06779
0.01589 0.01775 0.02250 0.01675 0.01766 0.01661 0.01811 0.02051 0.02103 0.03382
sum:  12736.272313

Skipping mask 0 with iou 0.705935 below threshold 0.880000
Skipping mask 1 with iou 0.762136 below threshold 0.880000
Mask 2: iou = 0.947081, stability_score = 0.955437, bbox (371, 436), (144, 168)


main:     load time =    51.28 ms
main:    total time =  2047.49 ms

real	0m2.068s
user	0m16.343s
sys	0m0.214s

Input point is (414.375, 162.796875) (currently hardcoded)

Input image:

llamas

Output mask (mask_out_2.png in build folder):

mask_glasses

References

Next steps

  • Reduce memory usage by utilizing the new ggml-alloc
  • Remove redundant graph nodes
  • Make inference faster
  • Fix the difference in output masks compared to the PyTorch implementation
  • Filter masks based on stability score
  • Add support for point user input
  • Support F16 for heavy F32 ops
  • Test quantization
  • Support bigger model checkpoints
  • GPU support
  • Add support for mask and box input

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 95.2%
  • Python 3.9%
  • CMake 0.9%