The autoreload extension is already loaded. To reload it, use:
%reload_ext autoreload
This repo code mostly taken from here
If you are trying to work in a environment where pytorch is already installed, please make sure to go first setttings.ini file and remove
torch
from requirements.
git clone [email protected]:HasanGoni/labeling_test.git
cd labeling_test
pip install -e .
In case ssh key is not added for github, then clone this repo using
git clone https://github.com/HasanGoni/labeling_test.git
then
cd labeling_test
pip3 install -e .
## Special considertation regarding `Huggingface`
When
transformers
model is downloaded, it normally downloaded in local~/.cache
folder. In case you have space scarcity in ~/.cache, you should change the download folder. Can be done in different way. One way could be following.
import os
os.environ['HF_HOME'] = '/home/ai_test/data/huggingface'
here I am changing the download folder to /home/ai_test/data/huggingface`. Make sure the repository exist
from huggingface_hub import hf_hub_download
from PIL import Image
import cv2
import matplotlib.pyplot as plt
from fastcore.all import *
import torch
import torch.nn.functional as F
from torchvision.transforms.functional import resize, to_pil_image
import numpy as np
from pathlib import Path
from typing import Tuple, List, Union
from transformers import AutoProcessor, SamModel
One consideration regarding
reading image and mask
. Right now you should usePIL
for reading image andopencv
for reading mask. I will try to correct it. But right now one shoudl use this way
# Downloading from Niels_rogge hugging face repo
# image file
file_name = hf_hub_download(repo_id="nielsr/persam-dog", filename="dog.jpg", repo_type="dataset")
# image mask file
m_file_name = hf_hub_download(repo_id="nielsr/persam-dog", filename="dog_mask.png", repo_type="dataset")
tst_im_path = hf_hub_download(
repo_id="nielsr/persam-dog",
filename="new_dog.jpg",
repo_type="dataset")
ref_image = Image.open(file_name).convert('RGB')
ref_mask = cv2.imread(m_file_name)
ref_mask = cv2.cvtColor(ref_mask, cv2.COLOR_BGR2RGB)
tst_img = Image.open(tst_im_path).convert('RGB')
ref_image
tst_img
processor = AutoProcessor.from_pretrained("facebook/sam-vit-huge")
# model = PerSamModel.from_pretrained("facebook/sam-vit-huge")
model = SamModel.from_pretrained("facebook/sam-vit-huge")
#device='cuda' # in case `gpu` is available otherwise use `cpu`
device='cpu'
outputs, tst_feat,topk_xy, topk_label, input_sam, best_idx = get_first_prediction(
ref_img=ref_image,
ref_msk=ref_mask,
tst_img=tst_img,
model=model,
processor=processor,
device=device,
print_=False)
show_(outputs['pred_masks'].to('cpu').numpy()[0][0][0])
outputs_fr,_, _, _,masks_fr = get_first_refined_mask(
ref_image=ref_image,
ref_mask=ref_mask,
tst_img=tst_img,
processor=processor,
model=model,
print_=False,
device=device,
)
show_all_masks(masks_fr, outputs=outputs_fr)
- So we have got 3 masks from first prediction.
- For the next step you should tell which one should be used as a prompt
input for next prediction.
- Normally best
IoU
should give you the best mask, in case it doesn’t provide that, then you should manually decide which mask you want to use. - this can be done in next function
get_last_refined_masks
, where parameterbest_idx
will control which idx one should use as a prompt. - best_idx = 0 means first mask
- best_idx = None means from above mask use the best
IoU
.
- Normally best
- In the above case you can see best
IoU = 0.99
is providing me the best mask. Therefore for next prompt I will be using best_idx=None for next input prompt
output_f, masks_f = get_last_refined_masks(
ref_image=ref_image,
ref_mask=ref_mask,
processor=processor,
model=model,
tst_img=tst_img,
device=device,
print_=False,
outputs=outputs_fr,
masks=masks_fr,
best_idx=None
)
show_all_masks(masks_f, outputs=output_f)
So we have our final masks. we can save the last image. Normally if you use previously best mask idx then actually here
99.99%
, you should get the best mask with bestIoU
- in case it is not, then again in save_mask function,index=[0,1 or 2]
path = Path(Path.home(), 'Schreibtisch/projects/data/persam')
save_mask(
masks_f,
path=Path(path,'mask_test.png'),
outputs=output_f,
index=None
)
Notebook copied from here
labels=["a dog."]
polygon_refinement=True
threshold=0.3
detector_id="IDEA-Research/grounding-dino-tiny"
segmenter_id="facebook/sam-vit-base"
file_name = hf_hub_download(
repo_id="nielsr/persam-dog",
filename="dog.jpg",
repo_type="dataset")
ref_image = Image.open(file_name).convert('RGB')
# testing together
cv_img, segs = grounding_dino_segmentation(
image=ref_image,
labels=["a dog."],
threshold=threshold,
polygon_refinement=polygon_refinement,
detector_id=detector_id,
segmenter_id=segmenter_id)
annotated_img = get_annotated_img(cv_img, segs)
show_(annotated_img)
# we will see some other data
image_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
img = load_image(image_url)
img
We need to tell what we want to detect from an image.
- We have cats and remote control in the image. lets see can we detect them.
labels=["a cat.", "a remote control."]
# refine the mask to get more accurate polygons
polygon_refinement=True
# model threshold , we can adjust this value to get more or less predictions
threshold=0.3
# which model we want to use for object detection
detector_id="IDEA-Research/grounding-dino-tiny"
# which model we want to use for segentation
segmenter_id="facebook/sam-vit-base"
# just load the image using PIL
img = load_image(image_url)
# testing together
cv_img, segs = grounding_dino_segmentation(
image=img,
labels=["a cat.", "a remote control."],
threshold=threshold,
polygon_refinement=polygon_refinement,
detector_id=detector_id,
segmenter_id=segmenter_id)
We will recive an image and results of the segmentation and detection.
len(segs)
segs[0].box, segs[0].mask
(BoundingBox(xmin=344, ymin=23, xmax=637, ymax=373),
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=uint8))
Now we will visualize the results of the segmentation and bouding box detection.
annotated_img = get_annotated_img(cv_img, segs)
show_(annotated_img)
- for SegGPT requirement
- one image prompt
- subsequent mask prompt
Then it will create segmentation mask of similar object to a test image
from transformers import SegGptImageProcessor, SegGptForImageSegmentation
checkpoint = "BAAI/seggpt-vit-large"
image_processor = SegGptImageProcessor.from_pretrained(checkpoint)
model = SegGptForImageSegmentation.from_pretrained(
checkpoint)
#force_download=True,
#resume_download=False)
ref_image
pil_ref_msk=Image.fromarray(ref_mask)
tst_img
test_image_mask = get_mask(
image_prompt=ref_image,
mask_prompt=pil_ref_msk,
model=model,
test_img=tst_img,
show=True
)
show_(test_image_mask)
- Nice SegGPT nicely works our test image.