kkyon / databot
- среда, 29 августа 2018 г. в 00:16:35
Python
High Performance Python Data driven programming framework for Web Crawler,ETL,Data pipeline work
Install and update using pip
:
pip install -U databot
https://groups.google.com/forum/#!forum/databotpy
All functions are connected by pipes (queues) and communicate by data.
When data come in, the function will be called and return the result.
Think about the pipeline operation in unix: ls|grep|sed
.
Benefits:
Databot provides pipe and route. It makes data-driven programming and powerful data flow processes easier.
Databot is easy to use and maintain, does not need configuration files, and knows about asyncio
and how to parallelize computation.
Here's one of the simple applications you can make:
_Load the price of Bitoin every 2 seconds. Advantage price aggregator sample can be found here.
from databot.flow import Pipe, Timer
from databot.botframe import BotFrame
from databot.http.http import HttpLoader
def main():
Pipe(
Timer(delay=2), # send timer data to pipe every 2 sen
"http://api.coindesk.com/v1/bpi/currentprice.json", # send url to pipe when timer trigger
HttpLoader(), # read url and load http response
lambda r: r.json['bpi']['USD']['rate_float'], # read http response and parese as json
print, # print out
)
BotFrame.render('simple_bitcoin_price')
BotFrame.run()
main()
below is the flow graph generated by databot .
Nodes will be run in parallel, and they will perform well when processing stream data.
With render function: BotFrame.render('bitcoin_arbitrage') databot will render the data flow network into a graphviz image. https://github.com/kkyon/databot/blob/master/examples/bitcoin_arbitrage.png
With replay mode enabled:
config.replay_mode=True
when an exception is raised at step N, you don't need to run from setup 1 to N.
Databot will replay the data from nearest completed node, usually step N-1.
It will save a lot of time in the development phase.
Data-driven programming is a programming paradigm which describes the data to be matched and the processing required rather than defining a sequence of steps to be taken. Standard examples of data-driven languages are the text-processing languages sed and AWK, where the data is a sequence of lines in an input stream. Data-driven programming is typically applied to streams of structured data for filtering, transforming, aggregating (such as computing statistics), or calling other programs.
Databot has a few basic concepts to implement DDP.
It is the main stream process of the program. All units will work inside.
It is the process logic node. It is driven by data. Custom functions work as Nodes. There are some built-in nodes:
It will be used to create a complex data flow network, not just one main process. Databot can nest Routes inside Routes. It is a powerful concept. There are some pre built-in Route:
All units (Pipe, Node, Route) communicate via queues and perform parallel computation in coroutines.
This is abstracted so that Databot can be used with only limited knowledge of asyncio
.