Python Crawlers series: A brief introduction to crawlers

This is the 23rd day of my participation in Gwen Challenge

There are Python articles all over the place, and you look up and run into a small editor

Learning Python is more about simplicity, ease of use, and ease of use than C. Python is the second programming language that I have learned. As a self-taught person, the more I feel that language is just a tool after learning Python, and it is easy to learn it. It takes more time and energy to master it, and it may not even achieve the ideal result. Anyway, you never know until you try.

In this month, will eventually Python series ended, in simple terms, when you have learned C or Java or other programming languages, and then taught himself a new programming language is quite easy to find a related video online + a book can master grammar knowledge, about small make up north Richard song day is to see the teacher’s class and the integrity of the electronic books. The article is very useful for review after study, after all, review the old and learn new.

After writing Python series, I want to write crawler series as well, so that you can also climb pictures, music, video and other convenient, small ** video can also oh, hee hee.

A Python crawler, as its name implies, crawls information. In the era of big data, the acquisition of information is very important, and it can even determine the development direction and future of a company. If the Internet is compared to a large net, then access to information needs to be retrieved in this large net, this approach is also known as a search engine, so Baidu Sogou is this approach.

To learn crawler, we must first cultivate the idea of crawler, for example, the text, pictures, videos and so on on the Network are actually saved by “something” and then returned to the user through the network.

There may be some friends who are curious about “something”. Here we quote a concept called URL, which can be simply understood as the route needed to find “something”, which is commonly referred to as the website or link.

URL: Uniform Resource Locator; URL is a representation on the Internet’s World Wide Web service that specifies the location of information. A URL can also be called the address of a standard resource on the Internet. Every file on the Internet has a unique URL that contains information indicating where the file is and what the browser should do with it. URL is universal resource locator, URI is also a resource locator, because the URL including URI, and URL applicability is wide and so URL has the upper hand, crawler take information goal is to have climbed, the purpose is the URL contains the file information, so that it is not hard to understand why you must be the exact URL to crawl to the file.

So a crawler is simply a bug that follows this route to find what we want, and then interprets it and extracts it.

For Chrome and Firefox, press F12 to enter developer mode, then go to Network, and open any file in Name. If you do not refresh the file, refresh the original page. Click on a file and you will see the Request URL in the image below, which is the original URL of the URL. Of course, the encryption and decryption of the URL will also cause the URL to be different, which will be learned later.

(Python crawler series) To continue…

Python Crawlers series: A brief introduction to crawlers

Related Posts

C# functions (concepts, definitions, overloads, arguments)

An error occurred due to load balancing. Procedure

How to write an interface document