ChaoningZhang / MobileSAM
- пятница, 30 июня 2023 г. в 00:00:01
This is the offiicial code for Faster Segment Anything (MobileSAM) project that makes SAM lightweight
2023/06/29: AnyLabeling supports MobileSAM for auto-labeling. Thanks for their effort.
2023/06/29: SonarSAM supports MobileSAM for Image encoder full-finetuing. Thanks for their effort.
2023/06/29: Stable Diffusion WebUIv supports MobileSAM. Thanks for their effort.
2023/06/28: Grounding-SAM supports MobileSAM with Grounded-MobileSAM. Thanks for their effort.
2023/06/27: MobileSAM has been featured by AK, see the link AK's MobileSAM tweet. Thanks for their effort.
The comparison of ViT-based image encoder is summarzed as follows:
Image Encoder | Original SAM | MobileSAM |
---|---|---|
Paramters | 611M | 5M |
Speed | 452ms | 8ms |
Original SAM and MobileSAM have exactly the same prompt-guided mask decoder:
Mask Decoder | Original SAM | MobileSAM |
---|---|---|
Paramters | 3.876M | 3.876M |
Speed | 4ms | 4ms |
The comparison of the whole pipeline is summarized as follows:
Whole Pipeline (Enc+Dec) | Original SAM | MobileSAM |
---|---|---|
Paramters | 615M | 9.66M |
Speed | 456ms | 12ms |
⭐ Original SAM and MobileSAM with a (single) point as the prompt.
Whole Pipeline (Enc+Dec) | FastSAM | MobileSAM |
---|---|---|
Paramters | 68M | 9.66M |
Speed | 64ms | 12ms |
mIoU | FastSAM | MobileSAM |
---|---|---|
100 | 0.27 | 0.73 |
200 | 0.33 | 0.71 |
300 | 0.37 | 0.74 |
400 | 0.41 | 0.73 |
500 | 0.41 | 0.73 |
The code requires python>=3.8
, as well as pytorch>=1.7
and torchvision>=0.8
. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.
Install Mobile Segment Anything:
pip install git+https://github.com/ChaoningZhang/MobileSAM.git
or clone the repository locally and install with
git clone git@github.com:ChaoningZhang/MobileSAM.git
cd MobileSAM; pip install -e .
The MobileSAM can be loaded in the following ways:
from mobile_encoder.setup_mobile_sam import setup_model
checkpoint = torch.load('../weights/mobile_sam.pt')
mobile_sam = setup_model()
mobile_sam.load_state_dict(checkpoint,strict=True)
Then the model can be easily used in just a few lines to get masks from a given prompt:
from segment_anything import SamPredictor
device = "cuda"
mobile_sam.to(device=device)
mobile_sam.eval()
predictor = SamPredictor(mobile_sam)
predictor.set_image(<your_image>)
masks, _, _ = predictor.predict(<input_prompts>)
or generate masks for an entire image:
from segment_anything import SamAutomaticMaskGenerator
mask_generator = SamAutomaticMaskGenerator(mobile_sam)
masks = mask_generator.generate(<your_image>)
If you use MobileSAM in your research, please use the following BibTeX entry.
@article{mobile_sam,
title={Faster Segment Anything: Towards Lightweight SAM for Mobile Applications},
author={Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung Ho and Lee, Seungkyu and Hong, Choong Seon},
journal={arXiv preprint arXiv:2306.14289},
year={2023}
}
@article{kirillov2023segany,
title={Segment Anything},
author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
journal={arXiv:2304.02643},
year={2023}
}
@InProceedings{tiny_vit,
title={TinyViT: Fast Pretraining Distillation for Small Vision Transformers},
author={Wu, Kan and Zhang, Jinnian and Peng, Houwen and Liu, Mengchen and Xiao, Bin and Fu, Jianlong and Yuan, Lu},
booktitle={European conference on computer vision (ECCV)},
year={2022}