microsoft / X-Decoder
- воскресенье, 25 декабря 2022 г. в 00:34:37
Official Implementation of X-Decoder for generalized decoding for pixel, image and language
[Project Page] [Paper] [Hugging Face Demo] [Video]
by Xueyan Zou*, Zi-Yi Dou*, Jianwei Yang*, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, Jianfeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong Jae Lee^, Jianfeng Gao^.
X-Decoder is a generalized decoding model that can generate pixel-level segmentation and token-level texts seamlessly!
It achieves:
It supports:
pip3 install torch==1.13.1 torchvision==0.14.1 --extra-index-url https://download.pytorch.org/whl/cu113
python -m pip install 'git+https://github.com/MaureenZOU/detectron2-xyz.git'
pip install git+https://github.com/cocodataset/panopticapi.git
python -m pip install -r requirements.txt
To prepare the dataset: DATASET.md
mpirun -n 8 python eval.py evaluate --conf_files configs/xdecoder/svlp_focalt_lang.yaml --overrides WEIGHT /pth/to/ckpt
Note: Due to zero-padding, filling a single gpu with multiple images may decrease the performance.
# For Segmentation Tasks
python demo/demo_captioning.py evaluate --conf_files configs/xdecoder/svlp_focalt_lang.yaml --overrides WEIGHT /pth/to/xdecoder_focalt_best_openseg.pt
# For VL Tasks
python demo/demo_captioning.py evaluate --conf_files configs/xdecoder/svlp_focalt_lang.yaml --overrides WEIGHT /pth/to/xdecoder_focalt_last_novg.pt
ADE | ADE-full | SUN | SCAN | SCAN40 | Cityscape | BDD | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | ckpt | PQ | AP | mIoU | mIoU | mIoU | PQ | mIoU | mIoU | PQ | mAP | mIoU | PQ | mIoU |
X-Decoder | BestSeg Tiny | 19.1 | 10.1 | 25.1 | 6.2 | 35.7 | 30.3 | 38.4 | 22.4 | 37.7 | 18.5 | 50.2 | 16.9 | 47.6 |
Model | Task | Log | PQ | mAP | mIoU |
---|---|---|---|---|---|
X-Decoder (davit-d5,Deformable) | PanoSeg | log | 52.4 | 38.7 | 59.1 |
We appreciate the contructive dicussion with Haotian Zhang, and inspiration from GLIP! Also we thank the solid codebase of Mask2Former, and Hugging Face to sponsor GPU for our Demo!
@article{zou2022xdecoder,
author = {Zou, Xueyan and Dou, Zi-Yi and Yang, Jianwei and Gan, Zhe and Li, Linjie and Li, Chunyuan and Dai, Xiyang and Wang, Jianfeng and Yuan, Lu and Peng, Nanyun and Wang, Lijuan and Lee, Yong Jae and Gao, Jianfeng},
title = {Generalized Decoding for Pixel, Image and Language},
publisher = {arXiv},
year = {2022},
}