ConnorJL / GPT2
- понедельник, 10 июня 2019 г. в 00:16:33
Python
An implementation of training for GPT2, supports TPUs
This is not the official GPT2 implementation!
An implementation of training for GPT2 that supports both GPUs and TPUs. The dataset scripts are a bit hack-y and will probably need to be adapted to your needs.
For GPUs:
pip3 install tensorflow-gpu regex
For TPUs:
pip3 install tensorflow regex google-api-python-client oauth2client
For downloading the models:
pip3 install requests tqdm
For generating the dataset (in addition to Tensorflow):
pip3 install ftfy tqdm newspaper3k
If you want to use my models, I currently have "117M" and "PrettyBig" to offer. 117M was trained on a single v2 TPU for a week (probably less than the original OpenAI model), PrettyBig is slightly bigger than 345M and was trained on a v2-256 pod for a week. I am planning on also releasing my version of 1.5B, which was trained on a v3-512 pod for around a week. Please see my blogposts here and here for more info.
python3 download_model.py PrettyBig
This will create two directories, one named as the model and another named "encoder". Change the "model_dir" and "encoder_path" parameters in the .json corresponding to your model to point to these paths, respectively.
If you only want the encoder, use:
python3 download_model.py encoder
To predict you can either pass the prompt directly in the command line, or have it read from a file. (This is useful for prompts that include new lines) Text is output to the console and the file specified in the "predict_path" parameter. You need a model checkpoint and a copy of the BPE encoder at an accessible location for this to work. (Change the "model_dir" and "encoder_path" parameters in the .json)
From command line:
python3 main.py --model Your-Model.json [--top_k Top-K-Truncation] --predict_text "Hello there! My name is"
From file:
python3 main.py --model Your-Model.json [--top_k Top-K-Truncation] --predict_file input.txt
The optional top_k parameter causes the model to only consider the top k most likely tokens at each step. Setting this around 40 tends to create better results, but with less variety.
Prediction on TPUs is not supported.
To train a model, define its parameters in a .json file (see examples) and then simply call
python3 main.py --model Your-Model.json [--tpu Your-TPU-Name]
Using a TPU is optional, it runs fine on GPUs without modification. (Note: Evaluation doesn't work on TPU pods and must be commented out)
This assumes you have a version of the openwebtext corpus stored in an accessible location. If you don't, see below how to generate your own version.
GPT2 is trained on the webtext corpus, which is basically all websites linked to from reddit with at least 3 Karma. Since the database is huge and contains a lot of copyrighted material, I can't provide a download here. Instead I'll describe how I got it. Be aware it cost me around ~500€ in cloud compute resources to download and process the whole thing, but I'm not claiming I was optimally efficient.
Because passing two dozen parameters over the command line would be tedious, you pass all the model parameters in a .json file. Note that any paths also support Google Storage paths and must be gs:// paths if you're running on TPUs.
Values you'll definitely want to change:
Values you'll probably want to change:
Model parameters:
Training parameters: