Open source project address: github.com/believedotc…
At present, the project is still not perfect, welcome to discuss with us in the comments section or Issues.
Note: Nuggets updates slowly, please go directly to the Github home page for the latest updates on the project.
Easy to use free agent pool
Compatible systems: Windows, Linux, macOS
- Timing automatic crawling of free agents on the network
- Agents are validated periodically, and the integration API returns available agents at any time
- No third party database support, one button startup, easy to use
- Integrates the WEB management interface to view the proxy status and configure the proxy pool
- With detailed comments, it is very easy to learn or modify
Screenshot of the WEB management page
Free agent sources that have been integrated
The name of the | address |
---|---|
Fast acting | www.kuaidaili.com/ |
So the agent | www.goubanjia.com/ |
66 agent | www.66ip.cn/ |
The cloud brokering | www.ip3366.net/ |
Free Agent library | ip.jiangxianli.com/ |
Minimagic HTTP proxy | ip.ihuan.me/ |
89 Free Agency | www.89ip.cn/ |
More useful proxy sources will be added later.
For details about how to install and use Github, go to the Github homepage.
Project workflow flowchart
This project mainly consists of three parts:
- Crawl process: mainly includes
fetchers
Directory andproc/run_fetcher.py
file - Validation process: mainly in
proc/run_validator.py
In the file - WEB and API: in
api
directory
The general logical diagram of the project is as follows:
Note: In order to facilitate understanding and drawing, the logic below is simplified after the logic, detailed process can see the code and corresponding notes.
Validation algorithm correlation
- How do I verify that the proxy is available
The algorithm currently available to verify the proxy is relatively simple, and the core idea is to use the Requests library to access a specified web page and see if the access is successful.
The configuration parameters (including timeout, number of attempts, etc.) can be found in config.py, and the code logic is in proc/run_validator.py.
- Which agent should be validated when
This problem is more complex, it is difficult to have a perfect solution, so the current algorithm is relatively simple, barely usable, can be found in the DB directory for the current algorithm description.
If you have a better algorithm, please discuss Issues with us, or modify the code according to the README file in db.