One article is enough to get through Python web requests, scrapy crawlers, servers, proxies, and all sorts of crap

Some time ago, shovel excrement officer lu Lu sent a lot of technical articles, because the length is too much, here to tidy up, and each article write a central idea, convenient for everyone to refer to. However, because of the number of words, the article is a systematic tutorial, there must be 99% of people did not read or finish.

Shovel excrement officer, however, there is a very enthusiastic reader, a graduate student, not a computer professional, fast soon graduated, want to find and computer work on the side, want to added in your resume, he wrote them according to shovel excrement officer tutorial articles, with step by step down, no problems, shovel excrement officer help to look at, Developed a set of school results query system on the public account. You can say this wave operation is strong!

When he finally told me that he had succeeded, I was very happy, because I felt that he had succeeded in developing the official account after reading my article in almost a week. I felt very impressive. And his experience, can write on his resume, is also very helpful to the school recruitment. So today, I’m going to tell you what my article says. This paper is a combination of INDEX and ABSTRACT.

Python series

We’re going to go through what you’ve learned in this series.

Based on article

Use Python code to access website 1024

This article will cover the most common Python operations used in crawlers. Without using any framework, you can learn from this article:

Request Indicates the network request operation.
Read and write Python files.
BeautifulSoup4 is used to parse HTML.

Note:

Since this project was written a long time ago, the community address has long since changed. In order for the program to run, you need to modify the website address, and some of the tag names in the middle of the HTML.

“[Python] Use code to automatically post on the 1024 forum, fast upgrade”

This article focuses on using Python’s Request session to perform POST logins. This step is critical. If your site requires a username and password to log in, follow the instructions in this article.

Session usage for Requests.
Simulate web site login.

Note:

Since this project was written a long time ago, the community address has long since changed. In order for the program to work, you need to change the site address.

Scrapy article

Create “1024 Web seed-devour crawler” using Scrapy

This article uses the framework of Scrapy crawler to crawl the website, and adds the pipeline, to crawl the results of the processing and saving processing, the pictures and seeds are saved locally.

Scrapy framework.
Pipeline saves images and seeds locally.
BeautifulSoup parses HTML.

Note:

This tutorial teaches you how to Scrapy your way up to the Gaelic community.

This article explains how to use Scrapy in great detail, by crawling through the Daguerre flag community, dissecting the HTML format step by step, and finally giving you a guide on how to save images. If you start Scrapy from scratch, follow this article.

Note:

Scrapy senior post

“[Python Combat] Deploy Scrapy crawlers step by step on Tencent Cloud”

This article mainly describes that after the crawler is written, it can be deployed to the cloud server and executed regularly on the cloud server, so that your crawler can be put into production practice in a real sense. The article goes through the deployment steps step by step, in great detail.

Python3 installation command in the cloud server.
Scrapyd deployment steps.
Steps for purchasing a cloud server.
Cloud server coupons.

Note:

Python3 installation is described in the article, but the command to create Python3 soft link is missing.

1# ln -s /usr/local/python3/bin/python3 /usr/bin/python3
2# ln -s /usr/local/python3/bin/pip3 /usr/bin/pip3
Copy the code

Once it’s running, if you want to run Python3 in the future, all you have to do is type
# python3That’s it. Same thing with PIP,
# pip3Replace the original PIP command.

There is also a line of code in the article that is wrong. When installing scrayd client on the cloud server, it should be:
pip install scrapyd-client

Crawler Server

“[Python Practice] Install MongoDB on Aliyun server and realize remote visual connection”

This article mainly describes the steps of installing MongoDb on Ali Cloud, with screenshots and instructions for each step. Very detailed tutorial. In Tencent cloud installation is the same step. Remember to modify the configuration file, open the port of the security group, start the service, and you can connect remotely. The creation of MongoDB is to prepare for storing crawl information in the future.

Aliyun coupons.
Ali Cloud install MongoDB detailed steps.
Visually connect to MongoDB.
Part of Python and MongoDB interaction code.

Follow me step by step and make your server dreams come true with Tornado.

This article describes how to start your Torndao server on Ali Cloud. The article describes two ways to return, very practical. If you want to do API, you can follow the ideas explained in this article to learn.

Tornado Basic operation.
Returns the web page format and API writing that returns JSON format.
How to upload local code to Ali Cloud server.
How to configure remote debugging for the local IDE.
The Tornado service process was deployed by AliCloud.

“Help you to deploy Nginx on your server, domain name, SSL certificate, including” Aliyun 100 yuan coupon “”

Now that you have the Aliyun server and successfully started your service program on the server, you can purchase the domain name, configure SSL certificate, and realize HTTPS access. This article will show you how to configure nginx and domain names. Make your web pages accessible by domain name instead of IP. Detailed steps, HD screenshots, existing configuration file text, just copy and paste.

Domain name purchase process.
Install and configure Nginx.
How to obtain an SSL certificate?
Configure HTTPS.

“Hand in hand with Ali cloud server to build sock tools, from now on no longer ask people”

A lot of websites can not be accessed, don’t be afraid, the shovelmaster will take you through the code to open the outside world. This article super detailed steps, really super detailed! Step by step, according to the instructions to knock on the code, it can be done! Purchase from the server with the final successful access to the full set of processes, a full set of screenshots. Really can’t fine again, fine feeling can only you paid to let shovel excrement officer help you debug.

Purchase method of overseas server.
Alibaba Cloud Tencent cloud coupons.
Acid acid (SS) configuration method.

Pure high order SAO operation

“1024 Seed Devourer 2.0, designed to Scrapy and mail, is faster and better!”

This is an improved version of the previous “seed devourer”, using FilesPipeline to replace requests in the original article, the efficiency is terrible! Also, with the email function, let you download your seed, save it locally, but also back up a copy in the mailbox, super cool!

Python sends emails with attachments.
Use of FilesPipeline.
Advanced use of Scrapy.

Note:

Since this project was written a long time ago, the community address has long since changed. In order for the program to work, you need to change the site address.

Make it easy for real hardcore crawlers to do whatever they want with their Scrapy crawlers.

This article, however, shows you how your Scrapy crawler can access web sites that you wouldn’t normally get access to, using a foreign server configuration. What’s more, the project can be deployed to the cloud, run automatically, and run without human clicks every day. Shovel excrement officer is taught every day dry goods, you refuse to accept.

Alibaba Cloud Tencent cloud server coupons.
Acid server configuration and client configuration methods.
Privoxy local configuration to implement HTTP proxy.
Add HTTP proxy methods to Scrapy.
Crawl foreign website information.

END

OK, so far, shovel shit officer wrote these articles, in fact, there is a small program article “hand in hand one-stop tutorial, dedicated to you have not written a small program”, used to help you small program entry, there are eggs, ha ha ha.

The code of all the above articles, to obtain: pay attention to the wechat public number “Pikepa excrement officer”, reply “code”, you can obtain the download address of all the code.

Finally, to promote their own small program “64 hexagrams”, fried chicken easy to use, nothing can shake a try.

So hardcore public number, still not concerned about a wave ah?

One article is enough to get through Python web requests, scrapy crawlers, servers, proxies, and all sorts of crap

Python series

Based on article

Scrapy article

Scrapy senior post

Crawler Server

Pure high order SAO operation

END

Related Posts

Go language efficient Web development five: What is object Relational Mapping ORM? Just plain not writing SQL?

Summary of 35 Java code performance optimizations

Quick start with the H5+ map API