github

twitter / torch-twrl

  • воскресенье, 18 сентября 2016 г. в 03:15:50
https://github.com/twitter/torch-twrl

Lua
Torch-twrl is a package that enables reinforcement learning in Torch.



Build Status License

torch-twrl: Reinforcement Learning in Torch

torch-twrl is an RL framework built in Lua/Torch by Twitter.

Installation

Install torch

git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh

Install torch-twrl

git clone --recursive https://github.com/twitter/torch-twrl.git
cd torch-twrl
luarocks make

Want to play in the gym?

  1. Start a virtual environment, not necessary but it helps keep your installation clean

  2. Download and install OpenAI Gym, gym-http-api requirements, and ffmpeg

pip install virtualenv
virtualenv venv
source venv/bin/activate
pip install gym
pip install -r src/gym-http-api/requirements.txt
brew install ffmpeg

Works so far?

You should have everything you need:

  • Start your gym_http_server with
python src/gym-http-api/gym_http_server.py
  • In a new console window (or tab), run the example script (policy gradient agent in environment CartPole-v0)
cd examples
chmod u+x cartpole-pg.sh
./cartpole-pg.sh

This script sets parameters for the experiment, in detail here is what it is calling:

th run.lua \
    -env 'CartPole-v0' \
    -policy categorical \
    -learningUpdate reinforce \
    -model mlp \
    -optimAlpha 0.9 \
    -timestepsPerBatch 1000 \
    -stepsizeStart 0.3 \
    -gamma 1 \
    -nHiddenLayerSize 10 \
    -gradClip 5 \
    -baselineType padTimeDepAvReturn \
    -beta 0.01 \
    -weightDecay 0 \
    -windowSize 10 \
    -nSteps 1000 \
    -nIterations 1000 \
    -video 100 \
    -optimType rmsprop \
    -verboseUpdate true \
    -uploadResults false \
    -renderAllSteps false

Your results should look something our results from the OpenAI Gym leaderboard

Doesn't work?

1) Test the gym-http-api

cd /src/gym-http-api/
nose2

2) Start a Gym HTTP server in your virtual environment

python src/gym-http-api/gym_http_server.py

3) In a new console window (or tab), run torch-twrl tests

luarocks make; th test/test.lua

Dependencies

Testing of RL development is a tricky endeavor, it requires well established, unified, baselines and a large community of active developers. The OpenAI Gym provides a great set of example environments for this purpose. Link: https://github.com/openai/gym

The OpenAI Gym is written in python and it expects algorithms which interact with its various environments to be as well. torch-twrl is compatible with the OpenAI Gym with the use of a modified Gym HTTP API, based on the original code from OpenAI; gym-http-api is a submodule of torch-twrl.

All Lua dependencies should be installed on your first build.

Note: if you make changes, you will need to recompile with

luarocks make

Agents

torch-twrl implements several agents, they are located in src/agents. Agents are defined by a model, policy, and learning update.

  • Random
    • model: noModel
    • policy: random
    • learningUpdate: noLearning
  • TD(Lambda)
    • model: qFunction
    • policy: egreedy
    • learningUpdate: tdLambda - implements temporal difference (Q-learning or SARSA) learning with eligibility traces (replacing or accumulating)
  • Policy Gradient Williams, 1992:
    • model: mlp - multilayer perceptron, final layeer: tanh for continuous, softmax for discrete
    • policy: stochasticModelPolicy, normal for continuous actions, categorical for discrete
    • learningUpdate: reinforce

Important note about agent/environment compatibility:

The OpenAI Gym has many environments and is continuously growing. Some agents may be compatible with only a subset of environments. That is, an agent built for continuous action space environments may not work if the environment expects discrete action spaces.

Here is a useful table of the environments, with details on the different variables that may help to configure agents appropriately.

Future Work

References

  1. Boyan, J., & Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. Advances in neural information processing systems, 369-376.
  2. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine learning, 3(1), 9-44.
  3. Singh, S. P., & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine learning, 22(1-3), 123-158.
  4. Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. Systems, Man and Cybernetics, IEEE Transactions on, (5), 834-846.
  5. Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. Vol. 1. No. 1. Cambridge: MIT press, 1998.
  6. Williams, Ronald J. "Simple statistical gradient-following algorithms for connectionist reinforcement learning." Machine learning 8.3-4 (1992): 229-256.

License

torch-twrl is released under the MIT License. Copyright (c) 2016 Twitter, Inc.