UX-Decoder / Semantic-SAM
- ΡΡΠ΅Π΄Π°, 19 ΠΈΡΠ»Ρ 2023β―Π³. Π² 00:00:13
Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
In this work, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. We have trained on the whole SA-1B dataset and our model can reproduce SAM and beyond it.
Our model supports a wide range of segmentation tasks and their related applications, including:
pip3 install torch==1.13.1 torchvision==0.14.1 --extra-index-url https://download.pytorch.org/whl/cu113
python -m pip install 'git+https://github.com/MaureenZOU/detectron2-xyz.git'
pip install git+https://github.com/cocodataset/panopticapi.git
git clone https://github.com/UX-Decoder/Semantic-SAM
cd Semantic-SAM
python -m pip install -r requirements.txt
export DATASET=/pth/to/dataset # path to your coco data
Please refer to prepare SA-1B data. Let us know if you need more instructions about it.
The currently released checkpoints are only trained with SA-1B data.
Name | Training Dataset | Backbone | 1-IoU@Multi-Granularity | 1-IoU@COCO(Max|Oracle) | download |
---|---|---|---|---|---|
Semantic-SAM | config | SA-1B | SwinL | 89.0 | 55.1|74.1 | model |
Semantic-SAM | config | SA-1B | SwinT | 88.1 | 54.5|73.8 | model |
For interactive segmentation.
python demo.py
For mask auto-generation.
python demo_auto_generation.py
We do zero-shot evaluation on COCO val2017.
$n
is the number of gpus you use
For SwinL backbone
python train_net.py --eval_only --resume --num-gpus $n --config-file configs/semantic_sam_only_sa-1b_swinL.yaml COCO.TEST.BATCH_SIZE_TOTAL=$n MODEL.WEIGHTS=/path/to/weights
For SwinT backbone
python train_net.py --eval_only --resume --num-gpus $n --config-file configs/semantic_sam_only_sa-1b_swinT.yaml COCO.TEST.BATCH_SIZE_TOTAL=$n MODEL.WEIGHTS=/path/to/weights
We currently release the code of training on SA-1B only. Complete training with semantics will be released later.
$n
is the number of gpus you use
before running the training code, you need to specify your training data of SA-1B.
export SAM_DATASET=/pth/to/dataset
export SAM_DATASET_START=$start
export SAM_DATASET_END=$end
We convert SA-1B data into 100 tsv files. start
(int, 0-99) is the start of your SA-1B data index and end
(int, 0-99) is the end of your data index.
If you are not using the tsv data formats, you can refer to this json registration for SAM for a reference.
For SwinL backbone
python train_net.py --resume --num-gpus $n --config-file configs/semantic_sam_only_sa-1b_swinL.yaml COCO.TEST.BATCH_SIZE_TOTAL=$n SAM.TEST.BATCH_SIZE_TOTAL=$n SAM.TRAIN.BATCH_SIZE_TOTAL=$n MODEL.WEIGHTS=/path/to/weights
For SwinT backbone
python train_net.py --resume --num-gpus $n --config-file configs/semantic_sam_only_sa-1b_swinT.yaml COCO.TEST.BATCH_SIZE_TOTAL=$n SAM.TEST.BATCH_SIZE_TOTAL=$n SAM.TRAIN.BATCH_SIZE_TOTAL=$n MODEL.WEIGHTS=/path/to/weights
We also support training to reproduce SAM
python train_net.py --resume --num-gpus $n --config-file configs/semantic_sam_reproduce_sam_swinL.yaml COCO.TEST.BATCH_SIZE_TOTAL=$n SAM.TEST.BATCH_SIZE_TOTAL=$n SAM.TRAIN.BATCH_SIZE_TOTAL=$n MODEL.WEIGHTS=/path/to/weights
This is a swinL backbone. The only difference of this script is to use many-to-one matching and 3 prompts as in SAM.
(a)(b) are the output masks of our model and SAM, respectively. The red points on the left-most image of each row are the user clicks. (c) shows the GT masks that contain the user clicks. The outputs of our model have been processed to remove duplicates.
We visualize the prediction of each content prompt embedding of points with a fixed order for our model. We find all the output masks are from small to large. This indicates each prompt embedding represents a semantic level. The red point in the first column is the click.
We also show that jointly training SA-1B interactive segmentation and generic segmentation can improve the generic segmentation performance.
We also outperform SAM on both mask quality and granularity completeness, please refer to our paper for more experimental details.
Release demo
Release code and checkpoints trained on SA-1B
Release demo with semantics
Release code and checkpoints trained on SA-1B and semantically-labeled datasets