Open source project address: github.com/believedotc…

At present, the project is still not perfect, welcome to discuss with us in the comments section or Issues.

Note: Nuggets updates slowly, please go directly to the Github home page for the latest updates on the project.

Easy to use free agent pool

Compatible systems: Windows, Linux, macOS

  • Timing automatic crawling of free agents on the network
  • Agents are validated periodically, and the integration API returns available agents at any time
  • No third party database support, one button startup, easy to use
  • Integrates the WEB management interface to view the proxy status and configure the proxy pool
  • With detailed comments, it is very easy to learn or modify

Screenshot of the WEB management page

Free agent sources that have been integrated

The name of the address
Fast acting www.kuaidaili.com/
So the agent www.goubanjia.com/
66 agent www.66ip.cn/
The cloud brokering www.ip3366.net/
Free Agent library ip.jiangxianli.com/
Minimagic HTTP proxy ip.ihuan.me/
89 Free Agency www.89ip.cn/

More useful proxy sources will be added later.

For details about how to install and use Github, go to the Github homepage.

Project workflow flowchart

This project mainly consists of three parts:

  1. Crawl process: mainly includesfetchersDirectory andproc/run_fetcher.pyfile
  2. Validation process: mainly inproc/run_validator.pyIn the file
  3. WEB and API: inapidirectory

The general logical diagram of the project is as follows:

Note: In order to facilitate understanding and drawing, the logic below is simplified after the logic, detailed process can see the code and corresponding notes.

Validation algorithm correlation

  1. How do I verify that the proxy is available

The algorithm currently available to verify the proxy is relatively simple, and the core idea is to use the Requests library to access a specified web page and see if the access is successful.

The configuration parameters (including timeout, number of attempts, etc.) can be found in config.py, and the code logic is in proc/run_validator.py.

  1. Which agent should be validated when

This problem is more complex, it is difficult to have a perfect solution, so the current algorithm is relatively simple, barely usable, can be found in the DB directory for the current algorithm description.

If you have a better algorithm, please discuss Issues with us, or modify the code according to the README file in db.