Origin of reptiles
First, the “crawler” here refers to the web crawler, originated in the early days of the Internet search engine. A job was created to automate the collection of web information.
After the emergence of crawler, although it seems not mainstream, it is actually one of the most important technologies in Internet application. In addition to the well-known Google and Baidu crawler, such old and new Internet giants as Toutiao, Dianping Meituan, Qunar and 58 are all crawler based information aggregation platforms, and they also have experienced crawler teams.
In addition to the narrow sense of web page content as a clue crawler, other programs or scripts to obtain information in the form of automation can be called “crawler”.
Scale of reptile
How much traffic do crawlers have on the Internet? Conservative estimates show that on average more than half of the traffic comes from crawlers, and some industries can even reach 90%.
Because compared to humans, human population growth is slow, reaction time is limited, and human flow is limited.
However, the scale of crawler increases directly with the increase of IT infrastructure, computing power, bandwidth and throughput, and its essence is to increase with the increase of information in the Internet, which is exponential.
And it’s growing so fast that reptiles can’t be destroyed, they can only be managed.
Reptilian black and white
A reptile is a tool created by a man to simplify his work. It is neutral, and the people who create and use it can use it to simplify work or to do evil.
Sometimes even can not define black and white, different people business purpose, fighting each other in the battlefield of the Internet, crawler technology has naturally become a weapon in this war.
The attack and defense of crawler is the contest of scale, automation and intelligence. Its essence is also the confrontation between people behind it.
Recently, I was fortunate enough to have a hand in hand with the head Internet company.
The reptile’s pincer attack
I don’t want to talk about the technical details of crawlers. Getting back to the subject, today’s talk about the reptile of the latest confrontation, which you may not have heard of.
Generally, we know that crawler is automatic. If we want to fight against crawler, we need to find the rule of automation and crack it.
Yes, but what is the pattern? Is it possible to solve this problem with machine learning or deep learning? Possible.
We always say “attack and defense confrontation”, and the confrontation is constantly escalating. There are people on both sides of the command.
A “pincer movement” is a simultaneous attack on a target with two different squads attacking from different angles or more.
One of them is a large feature reptile, with heavy artillery and a large number of people. It looks like a main force, and it will be easier for you or the system to find out the rules and control it.
Another time-dispersing feature, a reptile, acts like a guerrilla, constantly changing features, frequencies, making it hard for you to spot, stealing important information.
The goal of this offensive is to confuse your automatic rules and machine learning systems with large feature crawlers, making your anti-crawler system appear to be working smoothly and finding and containing large numbers of crawlers.
But critical information is still being lost. This kind of reptile offensive not only has the weapon technical level kill, but also has the tactical experience and the flexible response ability.
And the key to the ultimate confrontation in the Reptilian War. In the end, it costs resources (cost) and the size of the team.
conclusion
Internet giants monopolize technology and talent, and no one wants to take them on. Once their crawlers target small and medium sized or non-technology-driven companies, they can be knocked out of the water in a few rounds.
My team and I are right here to help companies like this, guard every inch of your information. Crawler technology, anti-crawler products interested partners are also welcome to consult and exchange.
Yang Dongdong donge.org