msracver / Deep-Feature-Flow
- суббота, 13 мая 2017 г. в 03:12:54
Python
Deep Feature Flow for Video Recognition
The major contributors of this repository include Xizhou Zhu, Yuwen Xiong, Jifeng Dai, Lu Yuan, and Yichen Wei.
Deep Feature Flow is initially described in a CVPR 2017 paper. It provides a simple, fast, accurate, and end-to-end framework for video recognition (e.g., object detection and semantic segmentation in videos). It is worth noting that:
Click image to watch our demo video
This is an official implementation for Deep Feature Flow for Video Recognition (DFF) based on MXNet. It is worth noticing that:
© Microsoft, 2017. Licensed under an Apache-2.0 license.
If you find Deep Feature Flow useful in your research, please consider citing:
@inproceedings{zhu17dff,
Author = {Xizhou Zhu, Yuwen Xiong, Jifeng Dai, Lu Yuan, Yichen Wei},
Title = {Deep Feature Flow for Video Recognition},
Conference = {CVPR},
Year = {2017}
}
@inproceedings{dai16rfcn,
Author = {Jifeng Dai, Yi Li, Kaiming He, Jian Sun},
Title = {{R-FCN}: Object Detection via Region-based Fully Convolutional Networks},
Conference = {NIPS},
Year = {2016}
}
training data | testing data | mAP@0.5 | time/image (Tesla K40) |
time/image (Maxwell Titan X) |
|
---|---|---|---|---|---|
Frame baseline (R-FCN, ResNet-v1-101) |
ImageNet DET train + VID train | ImageNet VID validation | 74.1 | 0.271s | 0.133s |
Deep Feature Flow (R-FCN, ResNet-v1-101, FlowNet) |
ImageNet DET train + VID train | ImageNet VID validation | 73.0 | 0.073s | 0.034s |
Running time is counted on a single GPU (mini-batch size is 1 in inference, key-frame duration length for Deep Feature Flow is 10).
The runtime of the light-weight FlowNet seems to be a bit slower on MXNet than that on Caffe.
MXNet from the offical repository. We tested our code on MXNet@(commit 62ecb60). Due to the rapid development of MXNet, it is recommended to checkout this version if you encounter any issues. We may maintain this repository periodically if MXNet adds important feature in future release.
Python packages might missing: cython, opencv-python >= 3.2.0, easydict. If pip
is set up on your system, those packages should be able to be fetched and installed by running
pip install Cython
pip install opencv-python==3.2.0.6
pip install easydict==1.6
For Windows users, Visual Studio 2015 is needed to compile cython module.
Any NVIDIA GPUs with at least 6GB memory should be OK
git clone https://github.com/msracver/Deep-Feature-Flow.git
cmd .\init.bat
. For Linux user, run sh ./init.sh
. The scripts will build cython module automatically and create some folders../rfcn/operator_cxx
to $(YOUR_MXNET_FOLDER)/src/operator/contrib
and recompile MXNet../external/mxnet/$(YOUR_MXNET_PACKAGE)
, and modify MXNET_VERSION
in ./experiments/rfcn/cfgs/*.yaml
to $(YOUR_MXNET_PACKAGE)
. Thus you can switch among different versions of MXNet quickly.To run the demo with our trained model (on ImageNet DET + VID train), please download the model manually from OneDrive, and put it under folder model/
.
Make sure it looks like this:
./model/rfcn_vid-0000.params
./model/rfcn_dff_flownet_vid-0000.params
Run (inference batch size = 1)
python ./rfcn/demo.py
python ./dff_rfcn/demo.py
or run (inference batch size = 10)
python ./rfcn/demo_batch.py
python ./dff_rfcn/demo_batch.py
Please download ILSVRC2015 DET and ILSVRC2015 VID dataset, and make sure it looks like this:
./data/ILSVRC2015/
./data/ILSVRC2015/Annotations/DET
./data/ILSVRC2015/Annotations/VID
./data/ILSVRC2015/Data/DET
./data/ILSVRC2015/Data/VID
./data/ILSVRC2015/ImageSets
Please download ImageNet pre-trained ResNet-v1-101 model and Flying-Chairs pre-trained FlowNet model manually from OneDrive, and put it under folder ./model
. Make sure it looks like this:
./model/pretrained_model/resnet_v1_101-0000.params
./model/pretrained_model/flownet-0000.params
All of our experiment settings (GPU #, dataset, etc.) are kept in yaml config files at folder ./experiments/{rfcn/dff_rfcn}/cfgs
.
Two config files have been provided so far, namely, Frame baseline with R-FCN and Deep Feature Flow with R-FCN for ImageNet VID. We use 4 GPUs to train models on ImageNet VID.
To perform experiments, run the python script with the corresponding config file as input. For example, to train and test Deep Feature Flow with R-FCN, use the following command
python experiments/dff_rfcn/dff_rfcn_end2end_train_test.py --cfg experiments/dff_rfcn/cfgs/resnet_v1_101_flownet_imagenet_vid_rfcn_end2end_ohem.yaml
A cache folder would be created automatically to save the model and the log under output/dff_rfcn/imagenet_vid/
.
Please find more details in config files and in our code.
Code has been tested under: