basveeling / wavenet
- пятница, 16 сентября 2016 г. в 03:14:01
Python
Keras WaveNet implementation
Based on https://deepmind.com/blog/wavenet-generative-model-raw-audio/ and https://arxiv.org/pdf/1609.03499.pdf.
Disclaimer: this is a re-implementation of the model described in the WaveNet paper by Google Deepmind. This repository is not associated with Google Deepmind.
Generate samples:
$ python wavenet.py predict with models/run_2016-09-14_11:32:09/config.json predict_seconds=1
pip install -r requirements.txt
Note: this installs a modified version of Keras and the dev version of Theano.
Once the first model checkpoint is created, you can start sampling. A pretrained model is included, so sample away! (Trained on the chopin dataset from http://iwk.mdw.ac.at/goebl/mp3.html)
Run:
$ python wavenet.py predict with models/run_2016-09-14_11:32:09/config.json predict_seconds=1
The latest model checkpoint will be retrieved and used to sample. The sample will be streamed to [run_folder]/samples
, you can start listening when the first sample is generated.
predict_seconds
: float. Number of seconds to sample.sample_argmax
: True
or False
. Always take the argmaxsample_temperature
: None
or float. Controls the sampling temperature. 0.01 seems to be a good value.seed
: int: Controls the seed for the sampling procedure.
e.g.:
$ python wavenet.py predict with models/[run_folder]/config.json predict_seconds=1 sampling_temperature=0.1
$ python wavenet.py
Or for a smaller network (less channels per layer).
$ python wavenet.py with small
Train with different configurations:
$ python wavenet.py with 'option=value' 'option2=value'
Available options:
batch_size = 64
data_dir = 'data'
debug = False
desired_sample_rate = 4410
dilation_depth = 9
early_stopping_patience = 20
fragment_length = 1024
fragment_stride = 2045
keras_verbose = 1
learn_all_outputs = True
nb_epoch = 1000
nb_filters = 256
nb_output_bins = 256
nb_stacks = 1
run_dir = None
seed = 3004083
train_only_in_receptive_field = True
use_bias = False
use_skip_connections = True
use_ulaw = True
optimizer:
decay = 0.0
epsilon = None
lr = 0.001
momentum = 0.9
nesterov = True
optimizer = 'sgd'
$ python wavenet.py 'data_dir=your_data_dir_name'
$ python wavenet.py test_preprocess with 'data_dir=your_data_dir_name'
The Wavenet model is quite expensive to train and sample from. We can however trade computation cost with accuracy and fidility by lowering the sampling rate, amount of stacks and the amount of channels per layer.
For a downsized model (4000hz vs 16000 sampling rate, 16 filters v/s 256, 2 stacks vs ??):