Python crawls the web page information and saves it as a CSV file! Another move!

This time, the web page is about all the recruitment information of Liepin.com in Shenzhen, a total of more than 400 positions, and saved as CSV file storage, ok, no more words, start to explain. (For those who are interested in crawlers, please refer to this article to crawl any website you want!!)

First open the target website:

The information on the page is as follows (since the job listings are dynamic, the positions on your screen may vary)

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/7088daaae7cd445caa7ef07043d30647)

Let’s press F12 to enter the developer screen:

Click the mouse-like button next to the element as follows:

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/1787a955b9b74443ac73eddfb4497546)

We can then click on the tag we want from the original page, and the HTML code for that tag will be displayed

For example, click on the job title: Bilingual Commentator and the right hand side will help us find the corresponding source code.

! [](https://p3-tt-ipv6.byteimg.com/large/pgc-image/da52442fd26b4fbdb7c890e974b00336)

We then analyzed the code above and below and found that the code for all positions was in

! [](https://p6-tt-ipv6.byteimg.com/large/pgc-image/3e1743c7d82c4723a0a88ed6fb5b758e)

So we can find the last TAB that contains these job codes, i.e

! [](https://p26-tt.byteimg.com/large/pgc-image/a1c5c869d38b42dbaf32ce942e73db31)

Find (“ul”, class_=”sojob-list”).find_all(“li”)

So we’re positioned under this list of jobs, and all of the following operations are queried from there, and we’re going through each of them in a loop

Under the child label

The find method allows the web page parser to search until it finds the corresponding tag and stops

Under the

Under that tag is what we’re going to crawl

! [](https://p6-tt-ipv6.byteimg.com/large/pgc-image/9a7b7ee2f1db4ce2a3d4ea4cab14545a)

Name = date.find(“a”, target=”_blank”).text.strip()

Then open the <p class=”condition clearfix> TAB and climb to the region, salary, corresponding website, degree

! [](https://p9-tt-ipv6.byteimg.com/large/pgc-image/80ffd393519b45d2a58d6c91e8a75661)

Area = date.find(“a”, class_=”area”).text

Find (“span”, class_=”text-warning”).text**

Crawl site realize * * : url = date. The find (” a “, class_ = “area”) “href” * *

Crawl education realize * * : edu = date. The find (” span “, class_ = “edu”). The text * *

Finally, we use a loop to change the url of the site, that is, the number at the end of the site is the page number, as follows:

! [](https://p1.pstatp.com/origin/pgc-image/896710fa521f408fbe1e854ee72330a1)

Finally, two more lines of command are used to save the result as a CSV file

Climb over!!

View the results:

! [](https://p9-tt-ipv6.byteimg.com/large/pgc-image/6f333a3e37a3494599c850df3ec1f559)

Attach the complete code:

import requests

import bs4

import pandas as pd

Result = {” jobName “: [], # jobname

“Area “: [], # area

Salary: [], # salary

“Url “: [], # url

“Edu” : [] # degree

}

for i in range(11):

Url = “www.liepin.com/zhaopin/?co…” + str(i)

print(url)

r = requests.get(url)

html = bs4.BeautifulSoup(r.text, “html.parser”)

all_job = html.find(“ul”, class_=”sojob-list”).find_all(“li”)

for date in all_job:

name = date.find(“a”, target=”_blank”).text.strip()

area = date.find(“a”, class_=”area”).text

salary = date.find(“span”, class_=”text-warning”).text

url = date.find(“a”, class_=”area”)[“href”]

edu = date.find(“span”, class_=”edu”).text

result[“jobname”].append(name)

result[“area”].append(area)

result[“salary”].append(salary)

result[“url”].append(url)

result[“edu”].append(edu)

df = pd.DataFrame(result)

df.to_csv(“shenzhen_Zhaopin.csv”, encoding=”utf_8_sig”)

PS: If you can’t solve the problem, you can click the link below to get it by yourself

Free Python learning materials and group communication solutions click to join

Python crawls the web page information and saves it as a CSV file! Another move!

Related Posts

Why count(*) is so slow?

Do a spring Boot mini-project from 0 to 1

Yii uses PHPMailer to send mail