Tencent / tencent-ml-images
- пятница, 19 октября 2018 г. в 00:16:04
Python
Largest multi-label image database; ResNet-101 model; 80.73% top-1 acc on ImageNet
This repository introduces the open-source project dubbed Tencent ML-Images, which publishes
Download Images using URLs
for details.The image URLs and the corresponding annotations can be downloaded above.
The format of train_urls.txt
is as follows
...
https://c4.staticflickr.com/8/7239/6997334729_e5fb3938b1_o.jpg 3:1 5193:0.9 5851:0.9 9413:1 9416:1
https://c2.staticflickr.com/4/3035/3033882900_a9a4263c55_o.jpg 1053:0.8 1193:0.8 1379:0.8
...
As shown above, one image corresponds to one row. The first term is the image URL. The followed terms separated by space are the annotations. For example, "5193:0.9" indicates class 5193 and its confidence 0.9. Note that the class index starts from 0, and you can find the class name from the file data/dictionary_and_semantic_hierarchy.txt.
The image URLs of ML-Images are collected from ImageNet and Open Images. Specifically,
Finally, ML-Images includes 17,609,752 training and 88,739 validation image URLs, covering 11,166 categories.
We build the semantic hiearchy of 11,166 categories, according to WordNet.
The direct parent categories of each class can be found from the file data/dictionary_and_semantic_hierarchy.txt. The whole semantic hierarchy includes 4 independent trees, of which
the root nodes are thing
, matter
, object, physical object
and atmospheric phenomenon
, respectively.
The length of the longest semantic path from root to leaf nodes is 16, and the average length is 7.47.
Since the image URLs of ML-Images are collected from ImageNet and Open Images, the annotations of ML-Images are constructed based on the original annotations from ImageNet and Open Images. Note that the original annotations from Open Images are licensed by Google Inc. under CC BY-4.0. Specifically, we conduct the following steps to construct the new annotations of ML-Images.
The annotations of all URLs in ML-Images are stored in train_urls.txt
and val_urls.txt
.
The main statistics of ML-Images are summarized in ML-Images.
# Train images | # Validation images | # Classes | # Trainable Classes | # Avg tags per image | # Avg images per class |
---|---|---|---|---|---|
17,609,752 | 88,739 | 11,166 | 10,505 | 8 | 1447.2 |
Note: Trainable class indicates the class that has over 100 train images.
The number of images per class and the histogram of the number of annotations in training set are shown in the following figures.
The full train_url.txt
is very large.
Here we provide a tiny file train_urls_tiny.txt to demonstrate the downloading procedure.
cd data
./download_im_from_url.py --url_list=train_urls_tiny.txt --im_list=train_im_tiny.txt --save_dir='images/'
A sub-folder data/images
will be generated to save the downloaded jpeg images, as well as a file train_im_tiny.txt
to save the image list and the corresponding annotations.
Note:As many URLs from ImageNet have expired, we also provide the correpsonding image indexes of ImageNet for these URLs in ML-Images. We provide two new files that include the corresponding image index of ImageNet for each URL that is from ImageNet, including
train_urls_and_index_from_imagenet.txt
and val_urls_and_index_from_imagenet.txt
.
The format is as follows
...
n03874293_7679 http://image24.webshots.com/24/5/62/52/2807562520031003846EfpYGc_fs.jpg 2964:1 2944:1 2913:1 2896:1 2577:1 1833:1 1054:1 1041:1 865:1 2:1
n03580845_3376 http://i02.c.aliimg.com/img/offer/22/85/63/27/9/228563279 3618:1 3604:1 1835:1 1054:1 1041:1 865:1 2:1
...
In each row, the first term is the image index in ImageNet, and the followings are the corresponding URL and annotations. Using these two files, you can directly obtain the original image from ImageNet, if the URL is invalid.
Here we generate the tfrecords using the multithreading module. One should firstly split the file train_im_tiny.txt
into multiple smaller files, and save them into the sub-folder data/image_lists/
.
cd data
./tfrecord.sh
./example/train.sh
Note that here we only provide the training code in the single node single GPU framework, while our actual training on ML-Images is based on an internal distributed training framework (not released yet). One could modify the training code to the distributed framework following distributed tensorFlow.
One should firstly download the ImageNet database, then prepare the tfrecord file using tfrecord.sh. Then, you can finetune the ResNet-101 model on ImageNet as follows, with the checkpoint pre-trained on ML-Images.
./example/finetune.sh
Please download above two checkpoints and move them into the folder checkpoints/
, if you want to extract features using them.
./example/extract_feature.sh
The retults of different ResNet-101 checkpoints on the validation set of ImageNet (ILSVRC2012) are summarized in the following table.
Checkpoints | Train and finetune setting | Top-1 acc on Val 224 |
Top-5 acc on Val 224 |
Top-1 acc on Val 299 |
Top-5 accuracy on Val 299 |
---|---|---|---|---|---|
MSRA ResNet-101 | train on ImageNet | 76.4 | 92.9 | -- | -- |
Google ResNet-101 ckpt1 | train on ImageNet, 299 x 299 | -- | -- | 77.5 | 93.9 |
Our ResNet-101 ckpt1 | train on ImageNet | 77.8 | 93.9 | 79.0 | 94.5 |
Google ResNet-101 ckpt2 | Pretrain on JFT-300M, finetune on ImageNet, 299 x 299 | -- | -- | 79.2 | 94.7 |
Our ResNet-101 ckpt2 | Pretrain on ML-Images, finetune on ImageNet | 78.8 | 94.5 | 79.5 | 94.9 |
Our ResNet-101 ckpt3 | Pretrain on ML-Images, finetune on ImageNet 224 to 299 | 78.3 | 94.2 | 80.73 | 95.5 |
Our ResNet-101 ckpt4 | Pretrain on ML-Images, finetune on ImageNet 299 x 299 | 75.8 | 92.7 | 79.6 | 94.6 |
Note:
The annotations of images are licensed by Tencent under CC BY 4.0 license. The contents of this repository, including the codes, documents and checkpoints, are released under an BSD 3-Clause license. Please refer to LICENSE for more details.
If there is any concern about the copyright of any image used in this project, please email us.
The arxiv paper describling the details of this project will be available soon!