Current conclusion: use tunnel agent monthly package

If not interested, understand the pit I encountered, can not look down.

This article records the problems related to financial application, technical change (agency plan and code change), that is, the dual problems of the company’s process and technology, and the complex approval process within the company.

Actually encountered problems, review experience, record the problem, and then write out, better analysis of the problem

It’s always good to write it down.


The story begins: Company A wants to develop crawler business to assist big data business and find some external data as A supplement

Phase 1: From 0 to 1

  • So my budget threshold here is zero500 yuan a month

Why 500 yuan, because the tunnel agent, is generally 500 yuan monthly start. If your monthly budget exceeds $500, use a tunnel agent directly from the start.

At this stage, just starting, we chose a pay-per-volume proxy, similar to Sesame proxy HTTP, with a proxy IP of 0.04 yuan.

Due to the low crawler consumption, no more than 500 yuan a month, using a pay-as-you-go HTTP proxy is the most cost-effective.

Those who pay by amount have a pit, realize the reuse of agent namely oneself, otherwise, 500 yuan, cannot use long time. See my discussion on reuse for details

Pay-per-volume HTTP proxy, just like manual transmission car, although the unit price is cheap (suitable for beginners to practice or master magic change), but they need to deal with more problems, such as reuse, and balance monitoring alarm and so on.

Stage 2: From 1 to 100

  • So my budget threshold here is zero2000-3000 yuan/month

At this stage, crawler data tasks gradually increased, and I introduced tunnel agents.

500 yuan, 5 concurrent, is the standard price of tunnel agent. But with five concurrent sessions, that’s not a lot of demand. Introduce 500 monthly tunnel agent + pay-as-you-go HTTP, both used together.

20 concurrent tunnel agents, generally around 2000 yuan per month.

  • If it is a high-concurrency task, use pay-per-volume HTTP. If an APP has more than a dozen interfaces, that is, more than a dozen data dimensions, the token is time-sensitive and must be completed within 5 minutes.

  • If yes, a large number of IP addresses need to be changed, and the distribution of scheduled tasks must be controlled. The value cannot exceed the threshold of five concurrent tasks in a 500 YUAN monthly package for tunnel proxy.

Stage 3: From 100 to 10000

  • So my budget threshold here is zero3000-10000 yuan/month

All use tunnel proxy, high configuration to pay. Easy, easy, easy to change.

Stage 4: From 10000 to the future

  • At this stage, the budget thresholdMore than 10000 yuan/month

I think we can talk about some cooperation from B2B perspective.

The use of magic to change the manual block, charging by volume;

Or tunnel agent; Or any other product form, that’s fine.

The amount of the above budget, which is also the gross estimate, is mainly expressed in an approximate range, and the use of agency costs in the project, the meaning of linear planning.


Appendix:

1. This article covers several of the following requirements

Regarding technical requirements, system optimization

1. Responsible for the design, development, testing, operation and maintenance of distributed data acquisition system; 2. Responsible for improving data processing program design framework, optimizing data processing performance, improving system data processing ability, and tackling key technologies; 3. Be responsible for the optimization research and implementation of acquisition algorithm/anti-crawl strategy/proxy IP/ verification code recognition, improve the crawl efficiency and success rate, and balance the invested capital budget and data output. 4. Improvement of monitoring system, real-time monitoring of task progress and alarm feedbackCopy the code

2, agency costs, in fact, is a linear programming problem, and product stage, budget, dosage.

3, first of all, we need to read more, such as “which crawler agent is strong? Ten paid agents detailed comparison and evaluation out!” . However, the most important thing is to choose a plan based on the actual project.

4. Using a framework Scrapy, write a ProxyMiddleware proxy, which is relatively easy to switch.

How can you save money on your mobile phone plan? You can also look at it, the same principle, we all have this problem