gaojiuli / gain
- воскресенье, 4 июня 2017 г. в 03:12:09
Python
Web crawling framework for everyone.
Web crawling framework for everyone. Written with asyncio, uvloop and aiohttp. Every could write their own web crawler easily with gain framework. Gain framework provide a pretty simple api.
pip install gain
Write spider.py:
from gain import Css, Item, Parser, Spider
class Post(Item):
title = Css('.entry-title')
content = Css('.entry-content')
async def save(self):
with open('scrapinghub.txt', 'a+') as f:
f.writelines(self.results['title'] + '\n')
class MySpider(Spider):
frequency = 5
start_url = 'https://blog.scrapinghub.com/'
parsers = [Parser('https://blog.scrapinghub.com/page/\d+/'),
Parser('https://blog.scrapinghub.com/\d{4}/\d{2}/\d{2}/[a-z0-9\-]+/', Post)]
MySpider.run()
run python spider.py
the examples are in the /example/
directory.
Just pull request or open issue.