Skip to main content

A light and fast python Web crawler framework based on asyncio.

Project description

Distributed🌍 - Asynchronous🏃 - Light☁️ - Fast⚡️ - Easy😄

AirSpider🕷️, a Light and Fast Python Web Crawler Framework Based on Redis🕷️


Overview👀

  • AirSpider is a high-performance asynchronous crawler framework for developers 🚀
  • Based on Redis: task distribution, task deduplication, and distributed ☁️

Requirements☁️

  • Python 3.6➕
  • Works on Linux, Windows, macOS🍎

Features🌲

  • Quick to Start ☑️
  • Low Coupling ☑️
  • High Cohesion ☑️
  • Easy Expansion ☑️
  • Orderly Workflow ☑️

Installation🔨

---------------------------

# For Linux && MacOS🔥
pip3 install airspider

---------------------------

# For Windows🔥
pip3 install airspider

---------------------------
  • Documents🔥

    Topics

    • Item:定义爬虫的目标字段
    • Selector:从HTML中提取出目标字段
    • Request:请求并抓取目标网站资源
    • Response:进一步封装响应内容
    • Middleware:使爬虫支持第三方扩展
    • Spider:爬虫程序的入口

TODO✈️

  • Complete Plugins of Redis🔥
  • Complete Distributed Architecture☁️

Contributing👬

AirSpider🕷️ is still under Developing🔨

Feel free to open issues💬 and pull requests💗

  • Report or Fix bugs🌈
  • Build Powerful plugins🔥
  • Make documentation Better📖
  • Add Examples of Crawling 🕷️

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

AirSpider-2.0.1.tar.gz (16.6 kB view hashes)

Uploaded Source

Built Distribution

AirSpider-2.0.1-py2.py3-none-any.whl (19.1 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page