dbolya / yolact
- среда, 18 декабря 2019 г. в 00:25:56
Python
A simple, fully convolutional model for real-time instance segmentation.
██╗ ██╗ ██████╗ ██╗ █████╗ ██████╗████████╗
╚██╗ ██╔╝██╔═══██╗██║ ██╔══██╗██╔════╝╚══██╔══╝
╚████╔╝ ██║ ██║██║ ███████║██║ ██║
╚██╔╝ ██║ ██║██║ ██╔══██║██║ ██║
██║ ╚██████╔╝███████╗██║ ██║╚██████╗ ██║
╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═╝ ╚═════╝ ╚═╝
A simple, fully convolutional model for real-time instance segmentation. This is the code for our papers:
YOLACT++'s resnet50 model runs at 33.5 fps on a Titan Xp and achieves 34.1 mAP on COCO's test-dev (check out our journal paper here).
In order to use YOLACT++, make sure you compile the DCNv2 code. (See Installation)
Some examples from our YOLACT base model (33.5 fps on a Titan Xp and 29.8 mAP on COCO's test-dev):
# Cython needs to be installed before pycocotools
pip install cython
pip install opencv-python pillow pycocotools matplotlib git clone https://github.com/dbolya/yolact.git
cd yolact./data/coco.
sh data/scripts/COCO.shtest-dev, download test-dev with this script.
sh data/scripts/COCO_test.shcd external/DCNv2
python setup.py build developHere are our YOLACT models (released on April 5th, 2019) along with their FPS on a Titan Xp and mAP on test-dev:
| Image Size | Backbone | FPS | mAP | Weights | |
|---|---|---|---|---|---|
| 550 | Resnet50-FPN | 42.5 | 28.2 | yolact_resnet50_54_800000.pth | Mirror |
| 550 | Darknet53-FPN | 40.0 | 28.7 | yolact_darknet53_54_800000.pth | Mirror |
| 550 | Resnet101-FPN | 33.5 | 29.8 | yolact_base_54_800000.pth | Mirror |
| 700 | Resnet101-FPN | 23.6 | 31.2 | yolact_im700_54_800000.pth | Mirror |
YOLACT++ models (released on December 16th, 2019):
| Image Size | Backbone | FPS | mAP | Weights | |
|---|---|---|---|---|---|
| 550 | Resnet50-FPN | 33.5 | 34.1 | yolact_plus_resnet50_54_800000.pth | Mirror |
| 550 | Resnet101-FPN | 27.3 | 34.6 | yolact_plus_base_54_800000.pth | Mirror |
To evalute the model, put the corresponding weights file in the ./weights directory and run one of the following commands. The name of each config is everything before the numbers in the file name (e.g., yolact_base for yolact_base_54_800000.pth).
# Quantitatively evaluate a trained model on the entire validation set. Make sure you have COCO downloaded as above.
# This should get 29.92 validation mask mAP last time I checked.
python eval.py --trained_model=weights/yolact_base_54_800000.pth
# Output a COCOEval json to submit to the website or to use the run_coco_eval.py script.
# This command will create './results/bbox_detections.json' and './results/mask_detections.json' for detection and instance segmentation respectively.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_json
# You can run COCOEval on the files created in the previous command. The performance should match my implementation in eval.py.
python run_coco_eval.py
# To output a coco json file for test-dev, make sure you have test-dev downloaded from above and go
python eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_json --dataset=coco2017_testdev_dataset# Display qualitative results on COCO. From here on I'll use a confidence threshold of 0.15.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --display# Run just the raw model on the first 1k images of the validation set
python eval.py --trained_model=weights/yolact_base_54_800000.pth --benchmark --max_images=1000# Display qualitative results on the specified image.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=my_image.png
# Process an image and save it to another file.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=input_image.png:output_image.png
# Process a whole folder of images.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --images=path/to/input/folder:path/to/output/folder# Display a video in real-time. "--video_multiframe" will process that many frames at once for improved performance.
# If you want, use "--display_fps" to draw the FPS directly on the frame.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=my_video.mp4
# Display a webcam feed in real-time. If you have multiple webcams pass the index of the webcam you want instead of 0.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=0
# Process a video and save it to another file. This uses the same pipeline as the ones above now, so it's fast!
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=input_video.mp4:output_video.mp4As you can tell, eval.py can do a ton of stuff. Run the --help command to see everything it can do.
python eval.py --helpBy default, we train on COCO. Make sure to download the entire dataset using the commands above.
./weights.
*_interrupt.pth file at the current iteration../weights directory by default with the file name <config>_<epoch>_<iter>.pth.# Trains using the base config with a batch size of 8 (the default).
python train.py --config=yolact_base_config
# Trains yolact_base_config with a batch_size of 5. For the 550px models, 1 batch takes up around 1.5 gigs of VRAM, so specify accordingly.
python train.py --config=yolact_base_config --batch_size=5
# Resume training yolact_base with a specific weight file and start from the iteration specified in the weight file's name.
python train.py --config=yolact_base_config --resume=weights/yolact_base_10_32100.pth --start_iter=-1
# Use the help option to see a description of all available command line arguments
python train.py --helpYOLACT now supports multiple GPUs seamlessly during training:
export CUDA_VISIBLE_DEVICES=[gpus]
nvidia-smi.8*num_gpus with the training commands above. The training script will automatically scale the hyperparameters to the right values.
--batch_alloc=[alloc] where [alloc] is a comma seprated list containing the number of images on each GPU. This must sum to batch_size.YOLACT now logs training and validation information by default. You can disable this with --no_log. A guide on how to visualize these logs is coming soon, but now you can look at LogVizualizer in utils/logger.py for help.
We also include a config for training on Pascal SBD annotations (for rapid experimentation or comparing with other methods). To train on Pascal SBD, proceed with the following steps:
benchmark.tgz).dataset/img. Create the directory ./data/sbd (where . is YOLACT's root) and copy dataset/img to ./data/sbd/img../data/sbd/.--config=yolact_resnet50_pascal_config. Check that config to see how to extend it to other models.I will automate this all with a script soon, don't worry. Also, if you want the script I used to convert the annotations, I put it in ./scripts/convert_sbd.py, but you'll have to check how it works to be able to use it because I don't actually remember at this point.
If you want to verify our results, you can download our yolact_resnet50_pascal_config weights from here. This model should get 72.3 mask AP_50 and 56.2 mask AP_70. Note that the "all" AP isn't the same as the "vol" AP reported in others papers for pascal (they use an averages of the thresholds from 0.1 - 0.9 in increments of 0.1 instead of what COCO uses).
You can also train on your own dataset by following these steps:
infoliscenseimage: license, flickr_url, coco_url, date_capturedcategories (we use our own format for categories, see below)dataset_base in data/config.py (see the comments in dataset_base for an explanation of each field):my_custom_dataset = dataset_base.copy({
'name': 'My Dataset',
'train_images': 'path_to_training_images',
'train_info': 'path_to_training_annotation',
'valid_images': 'path_to_validation_images',
'valid_info': 'path_to_validation_annotation',
'has_gt': True,
'class_names': ('my_class_id_1', 'my_class_id_2', 'my_class_id_3', ...)
})class_names. If this isn't the case for your annotation file (like in COCO), see the field label_map in dataset_base.python train.py --help), train.py will output validation mAP for the first 5000 images in the dataset every 2 epochs.yolact_base_config in the same file, change the value for 'dataset' to 'my_custom_dataset' or whatever you named the config object above. Then you can use any of the training commands in the previous section.See this nice post by @Amit12690 for tips on how to annotate a custom dataset and prepare it for use with YOLACT.
If you use YOLACT or this code base in your work, please cite
@inproceedings{yolact-iccv2019,
author = {Daniel Bolya and Chong Zhou and Fanyi Xiao and Yong Jae Lee},
title = {YOLACT: {Real-time} Instance Segmentation},
booktitle = {ICCV},
year = {2019},
}
For YOLACT++, please cite
@misc{yolact-plus-arxiv2019,
title = {YOLACT++: Better Real-time Instance Segmentation},
author = {Daniel Bolya and Chong Zhou and Fanyi Xiao and Yong Jae Lee},
year = {2019},
eprint = {1912.06218},
archivePrefix = {arXiv},
primaryClass = {cs.CV}
}
For questions about our paper or code, please contact Daniel Bolya.