hungtraan / FacebookBot
- вторник, 20 сентября 2016 г. в 03:15:10
JavaScript
A Facebook Messenger Bot that supports Voice Recognition, Natural Language Processing and features such as: search nearby restaurants, search trending news, transcribe and save memos to the cloud.
Optimist Prime is a Facebook Messenger Bot that supports Voice Recognition, Natural Language Processing and features such as: search nearby restaurants, search trending news, transcribe and save memos to the cloud. It also save user data (with permissions, of course) such as favorite locations, can provide customized greetings (acknowledging user's time in any time zone, i.e. Good morning/Good evening) and entertaining responses, etc.
(For a simpler "echo bot" proof-of-concept implementation of the Facebook Messenger Bot, check out this simplified project with a 10-minute tutorial)
Table of Contents
Demo (click on Message to start chatting with it)
Note: Optimist Prime is implemented with different APIs for features like user management, voice recognition, restaurant search, trending news search, so it takes some time to config & get it up and running. For a more basic "echo bot" that responses to you whatever you say to it, use
facebook-echobot.py
, or head over to Facebook's own Messenger app Quick Start. The echo bot is useful to get a quick glance of the fundamental ideas behind a Facebook Messenger Bot.
In order to build your own bot with all features of Optimist Prime, you'll need a few set-ups:
pip install -r requirements.txt
(preferably getting into your virtual environment virtualenv
/venv
- read all about pip
and venv
here)Then a few configurations in config.py
:
instance
, and create another config.py
file in it. (More on Flask configurations)To run locally, as simple as:
python facebookbot.py 3000
Or with gunicorn
(as I do on Heroku) (Flask and gunicorn tutorial)
gunicorn facebookbot:app -b localhost:3000
Now that you've got the bot running, you'll need to set up a webhook for Facebook to send messages the bot receives to. This could be done with ngrok
, which will set up a secure tunnel to your localhost server:
./ngrok http 3000
Get the https
URL (Facebook requires https
secured webhooks) and subscribe your Facebook App to this webhook. The verification token is your own token defined in OWN_VERIFICATION_TOKEN
in config.py
.
I've provided the Procfile for deployment on Heroku. You can create a Heroku app, spin up a free dyno and deploy your own Optimist Prime with this tutorial.
For the voice recognition to work, we'll need to include ffmpeg
on our Heroku dyno, which could be done by adding a Heroku Buildpack to your app's Settings tab on Dashboard:
https://github.com/jonathanong/heroku-buildpack-ffmpeg-latest.git
Finally, set your environment variable for the path to ffmpeg
:
heroku config:set FFMPEG_PATH=/app/vendor/ffmpeg/ffmpeg
Or on your app’s settings tab on Dashboard:
Now you're ready to deploy. Tutorial on how to deploy with Heroku and git.
In later iterations, all you need to do with Heroku is the glorious 3 lines:
git add .
git commit -am "Awesome commit"
git push heroku master
Amazon Web Service: I'm a fan of AWS and have had great experience with Beanstalk. However, if you want to use AWS, you'll need to go the extra mile of obtaining an SSL cert to have a secured webhook. For the purposes of Optimist Prime, I decided to go with Heroku instead, since it readily provides a https
connections.
The Voice Recognition is implemented with both IBM Watson's Speech-to-Text API and Google Cloud Speech API (default to IBM Watson as Google CLoud Speech is still in Beta, and my tests showed Watson so be more accurate). The current implementation is based on their RESTful methods (both support real time processing with WebSocket and WebRTC, respectively). Both are available for free at development-level use.
To use IBM Watson's Speech-to-Text, you'll need to create a IBM Bluemix account and add the service to your account, then retrieve the API's username and password. Lastly, copy these credentials to Speech/credentials.py
.
To use Google Cloud Speech API, the process is a little bit more complicated as you'll need to export Google's credentials as a environment variable. However, the whole process is well-documented by Google over here. As soon as you have got the Service Account key file (json) and exported GOOGLE_APPLICATION_CREDENTIALS
environment variable to the key file's location, you're set to go.
To switch between IBM Watson and Google for speech recognition: Setting the environment variable as follow:
export FB_BOT_STT_API_PROVIDER=GOOGLE
export FB_BOT_STT_API_PROVIDER=IBM
The result text processed by this Speech-to-Text API is then returned just like a text message the bot receives, which then goes through NLP for detecting commands/conversations.
Optimist Bot receives commands from users as both text and voice input, and understands commands in natural language.
This is done by using the pattern
NLP library, which allows the bot to deconstruct the user's text input and recognize parts of speech. For now, the model for categorizing commands are simple with stopwords and sentence structures, but as our data grows, we can start building more complex machine learning categorization for each function.
The command system allows users to use the following features, all of which are under the Utils
folder.
Example commands:
irish bar near by.
find chinese restaurants.
find me a good coffee shop around here.
show me Chinese food close by.
find mexican restaurants near here.
I want to have vietnamese food tonight.
is there a korean bbq nearby?
what are some cambodian grill close by?
find an ethiopian restaurant.
I want mediterranean food.
find a Target.
find me a KFC around.
I'd like to eat at McDonalds.
find me some fast food places in ohio city.
find me a brewery near downtown san francisco.
After receiving the command, Optimist Prime would ask for your location. You can input either a text/voice-based location name or send your exact GPS location (with Facebook Messenger on mobile devices). The Yelp Search API requires a coordinate for exact location search, so the reverse lookup from location-name to coordinate is handled by the the geocoding capability of the Geopy library. An alternative (probably more updated and smarter with complex names) would be Google Maps Geocoding API. Optimist Prime currently uses Geopy
. Optimist Prime also offers to save your location for future reference.
Optimist Prime leverage Yelp's API. Included in the code is both the APIv2 (stable) and APIv3 (developer preview). Both require you to acquire their API key.
After you've got your API key, put them into config.py
To switch between v2 and v3, change the import
statement in facebookbot.py
between yelp_search_v2
and yelp_search_v3
from Utils.Yelp import yelp_search_v3 as yelp_search
Example commands:
get me news about Harvard.
find news about Zika.
get me some news about the US presidential elections.
Get trending news about the US in the Olympics.
look for latest news on the Olympics.
The Trending News Search leverages Webhose.io API. The service crawls the web for news along with its social strength (Facebook likes, Shares, Twitter posts). In case of user searching for not-so-trending or niche topics, Optimist Prime lowers its "trending" criteria as well as search time frame to get the best results.
Example commands:
memorize this for me: [continue speaking your memo]
memorize this: [continue speaking your memo]
memorize this (stop talking, Optimist Prime will prompt you to start your memo)
can you memorize this for me?
This feature is still in its infancy/concept. After the user saves a memo, s/he can access it on the web with the link provided by the bot.
The unfortunate catch I found was that Facebook uses different user_ids for Facebook Profile (which is used for login) and Messenger. A same account would have 2 different user_ids, and the bot only receives the Messenger user_id from the Messenger API, thus making the implementation of a secured Facebook Login feature impossible. My current solution is to allow users to access their own memo inside the bot chat using user_ids as URL paths for querying. Future: We might use Account Linking for this.
LICENSE.txt
https://developers.facebook.com/docs/messenger-platform/product-overview
{
"object": "page",
"entry": [
{
"id": "1384358948246110",
"time": 1473197313689,
"messaging": [
{
"sender": {
"id": "1389166911110336"
},
"recipient": {
"id": "1384358948246110"
},
"timestamp": 1473197313651,
"message": {
"mid": "mid.1473197313635:0a67934dfc4f04a629",
"seq": 7651,
"text": "Hey"
}
}
]
}
]
}
{
"object": "page",
"entry": [
{
"id": "1384358948246110",
"time": 1473197300200,
"messaging": [
{
"sender": {
"id": "1389166911110336"
},
"recipient": {
"id": "1384358948246110"
},
"timestamp": 1473197300143,
"message": {
"mid": "mid.1473197298861:d6cf1fae1ad44ff234",
"seq": 7650,
"attachments": [
{
"type": "audio",
"payload": {
"url": "https://cdn.fbsbx.com/v/t59.3654-21/14109832_10209906561878191_940661414_n.mp4/audioclip-1473197298000-2056.mp4?oh=85e027f68e17fa0b1c189c3d7f3164bf&oe=57D0B0F3"
}
}
]
}
}
]
}
]
}
{
"object": "page",
"entry": [
{
"id": "1384358948246110",
"time": 1473197244135,
"messaging": [
{
"sender": {
"id": "1389166911110336"
},
"recipient": {
"id": "1384358948246110"
},
"timestamp": 1473197244008,
"message": {
"mid": "mid.1473197243814:3803076c5438a13036",
"seq": 7646,
"attachments": [
{
"title": "Hung's Location",
"url": "https://www.facebook.com/l.php?u=https%3A%2F%2Fwww.bing.com%2Fmaps%2Fdefault.aspx%3Fv%3D2%26pc%3DFACEBK%26mid%3D8100%26where1%3D40.070706608101%252C%2B-82.525680894134%26FORM%3DFBKPL1%26mkt%3Den-US&h=mAQE9bbu3&s=1&enc=AZPC_QlKfUFl7dehzlPuSpsio7LMKtRwyM58oaqUtt89CfKBofXVoW48cYrASUdCm-MYSpFMI2ejgmTR90taFN4wyv0aCYNH_GG3MR5sEe62NQ",
"type": "location",
"payload": {
"coordinates": {
"lat": 40.070706608101,
"long": -82.525680894134
}
}
}
]
}
}
]
}
]
}
There are also other useful types of message (also implemented in this bot), including Quick Reply, Postback at Facebook Messenger API documentation.
The catch for processing voice messages from the Facebook Messenger API is converting Facebook's compressed mp3 to a valid input format for the Speech-to-Text API. Both IBM and Google do not support mp3, and their input format include principle audio formats like WAV, FLAC, OGG, etc. Therefore, Optimist Prime actually has to download the mp3 audio, convert it to WAV, and upload it to the Speech API, which is a round trip that significantly increases response time for each audio command. In this project, I used ffmpeg
and call it with Python's subprocess
to convert the audio.
subprocess
is a Python tool that allows you to trigger command line-like commands, so what the program does is equivalent to it calling another program "by typing this command into the command line"
Under the hood, the bot does the following:
ffmpeg
to convert:
subprocess
to initiate a native ffmpeg command (just as you would do in the command shell)This approach takes the output of ffmpeg
directly from the pipe and upload it without saving it to a temp file and then uploading the file.
The story behind it was a learning experience on Heroku: Everything works perfectly on local, but when deployed to Heroku, Python keeps saying the file previously downloaded is not found. I ssh-ed into the Heroku dyno, and fascinatingly nothing was ever downloaded. I suspect this has to do with either: Heroku ephemeral file system (which does not allow program to save file due to the fact that it is a distributed system - but I highly doubt this hypothesis), or that I needed to use absolute path for any read/write operation on Heroku. I'm leaning more to the latter, as the same problem happened for subprocess
to call ffmpeg
, as I had to explicitly declare the path to ffmpeg (pictured above).
This poses a questions of scalability (on theory): multiple concurrent conversions could max out the memory as this is done in the pipe. However, I believe this would not be the bottle-neck at scale, as most audio files tend to be less than 1MB, so for multiple users, the bottle-neck would lie in the connection to download/upload files, instead of the memory to convert all these files. Files would be done with conversion before another file is done downloading. This hypothesis has to be tested.
When a user sends an audio to the bot, the bot will "receive" an URL to the file, as processed in the code below in the main bot file facebookbot.py
: