titu1994 / Neural-Style-Transfer
- суббота, 3 сентября 2016 г. в 03:13:18
Python
Keras Implementation of Neural Style Transfer from the paper "A Neural Algorithm of Artistic Style" ( http://arxiv.org/abs/1508.06576 ) in Keras 1.0.8
Implementation of Neural Style Transfer from the paper A Neural Algorithm of Artistic Style in Keras 1.0.8.
INetwork implements and focuses on certain improvements suggested in Improving the Neural Algorithm of Artistic Style.
Uses the VGG-16 model as described in the Keras example below : https://github.com/fchollet/keras/blob/master/examples/neural_style_transfer.py
Uses weights from Keras Deep Learning Models : https://github.com/fchollet/deep-learning-models
Result after 50 iterations (Average Pooling)
For comparison, results after 50 iterations (Max Pooling)
For comparison, results after 50 iterations using the INetwork. Notice that in comparison to Max Pooling results, it is far more detailed in the mountain peaks and colours are more natural
DeepArt.io result (1000 iterations and using improvements such as Markov Random Field Regularization)
For reference, the same image with Color Preservation
As an example, here are two images of the Sagano Bamboo Forest with the "pattened-leaf" style, with and without color preservation
See the guide for details regarding how to use the script to acheive the best results
Weights are now automatically downloaded and cached in the ~/.keras (Users//.keras for Windows) folder under the 'models' subdirectory. The weights are a smaller version which include only the Convolutional layers without Zero Padding Layers, thereby increasing the speed of execution.
Note: Requires the latest version of Keras (1.0.7+) due to use of new methods to get files and cache them into .keras directory.
Uses 'conv5_2' output to measure content loss. Original paper utilizes 'conv4_2' output
Initial image used for image is the base image (instead of random noise image) This method tends to create better output images, however parameters have to be well tuned. Therefore their is a argument 'init_image' which can take the options 'content' or 'noise'
Can use AveragePooling2D inplace of MaxPooling2D layers
The original paper uses AveragePooling for better results, but this can be changed to use MaxPooling2D layers via the argument --pool_type="max"
. By default MaxPoooling is used, since if offers sharper images, but AveragePooling applys the style better in some cases (especially when style image is the "Starry Night" by Van Goph.
Style weight scaling
These improvements are almost same as the Chain Blurred version, however a few differences exist :
feature_layers
list will be sufficient to apply these changes to the VGG-19 network. It is a C# program written to more easily generate the arguments for the python script Network.py or INetwork.py
Both Network.py and INetwork.py have similar usage styles, and share all parameters.
Network.py
python network.py "/path/to/content image" "path/to/style image" "result prefix or /path/to/result prefix"
INetwork.py
python inetwork.py "/path/to/content image" "path/to/style image" "result prefix or /path/to/result prefix"
There are various parameters discussed below which can be modified to alter the output image. Note that many parameters require the command to be enclosed in double quotes ( " " ).
Example:
python inetwork.py "/path/to/content image" "path/to/style image" "result prefix or /path/to/result prefix" --preserve_color "True" --pool_type "ave" --rescale_method "bicubic" --content_layer "conv4_2"
--image_size : Allows to set the Gram Matrix size. Default is 400 x 400, since it produces good results fast.
--num_iter : Number of iterations. Default is 10. Test the output with 10 iterations, and increase to improve results.
--init_image : Can be "content" or "noise". Default is "content", since it reduces reproduction noise.
--pool_type : Pooling type. MaxPooling ("max") is default. For smoother images, use AveragePooling ("ave").
--preserve_color : Preserves the original color space of the content image, while applying only style. Post processing technique on final image, therefore does not harm quality of style.
--min_improvement : Sets the minimum improvement required to continue training. Default is 0.0, indicating no minimum threshold. Advised values are 0.05 or 0.01
--content_weight : Weightage given to content in relation to style. Default if 0.025
--style_weight : Weightage given to style in relation to content. Default is 1.
--style_scale : Scales the style_weight. Default is 1.
--total_variation_weight : Regularization factor. Smaller values tend to produce crisp images, but 0 is not useful. Default = 8.5E-5
--rescale_image : Rescale image to original dimensions after each iteration. (Bilinear upscaling)
--rescale_method : Rescaling algorithm. Default is bilinear. Options are nearest, bilinear, bicubic and cubic.
--maintain_aspect_ratio : Rescale the image just to the original aspect ratio. Size will be (gram_matrix_size, gram_matrix_size * aspect_ratio). Default is True
--content_layer : Selects the content layer. Paper suggests conv4_2, but better results can be obtained from conv5_2. Default is conv5_2.
On a 980M GPU, the time required for each epoch depends on mainly image size (gram matrix size) :
For a 400x400 gram matrix, each epoch takes approximately 8-10 seconds.
For a 512x512 gram matrix, each epoch takes approximately 15-18 seconds.
For a 600x600 gram matrix, each epoch takes approximately 24-28 seconds.