This time, the web page is about all the recruitment information of Liepin.com in Shenzhen, a total of more than 400 positions, and saved as CSV file storage, ok, no more words, start to explain. (For those who are interested in crawlers, please refer to this article to crawl any website you want!!)
First open the target website:
The information on the page is as follows (since the job listings are dynamic, the positions on your screen may vary)
Let’s press F12 to enter the developer screen:
Click the mouse-like button next to the element as follows:
We can then click on the tag we want from the original page, and the HTML code for that tag will be displayed
For example, click on the job title: Bilingual Commentator and the right hand side will help us find the corresponding source code.
We then analyzed the code above and below and found that the code for all positions was in
So we can find the last TAB that contains these job codes, i.e
Find (“ul”, class_=”sojob-list”).find_all(“li”)
So we’re positioned under this list of jobs, and all of the following operations are queried from there, and we’re going through each of them in a loop
The find method allows the web page parser to search until it finds the corresponding tag and stops
Under that tag is what we’re going to crawl
Name = date.find(“a”, target=”_blank”).text.strip()
Then open the <p class=”condition clearfix> TAB and climb to the region, salary, corresponding website, degree
so
Area = date.find(“a”, class_=”area”).text
Find (“span”, class_=”text-warning”).text**
Crawl site realize * * : url = date. The find (” a “, class_ = “area”) “href” * *
Crawl education realize * * : edu = date. The find (” span “, class_ = “edu”). The text * *
Finally, we use a loop to change the url of the site, that is, the number at the end of the site is the page number, as follows:
Finally, two more lines of command are used to save the result as a CSV file
Climb over!!
View the results:
Attach the complete code:
import requests
import bs4
import pandas as pd
Result = {” jobName “: [], # jobname
“Area “: [], # area
Salary: [], # salary
“Url “: [], # url
“Edu” : [] # degree
}
for i in range(11):
Url = “www.liepin.com/zhaopin/?co…” + str(i)
print(url)
r = requests.get(url)
html = bs4.BeautifulSoup(r.text, “html.parser”)
all_job = html.find(“ul”, class_=”sojob-list”).find_all(“li”)
for date in all_job:
name = date.find(“a”, target=”_blank”).text.strip()
area = date.find(“a”, class_=”area”).text
salary = date.find(“span”, class_=”text-warning”).text
url = date.find(“a”, class_=”area”)[“href”]
edu = date.find(“span”, class_=”edu”).text
result[“jobname”].append(name)
result[“area”].append(area)
result[“salary”].append(salary)
result[“url”].append(url)
result[“edu”].append(edu)
df = pd.DataFrame(result)
df.to_csv(“shenzhen_Zhaopin.csv”, encoding=”utf_8_sig”)
PS: If you can’t solve the problem, you can click the link below to get it by yourself
Free Python learning materials and group communication solutions click to join