This time, the web page is about all the recruitment information of Liepin.com in Shenzhen, a total of more than 400 positions, and saved as CSV file storage, ok, no more words, start to explain. (For those who are interested in crawlers, please refer to this article to crawl any website you want!!)

First open the target website:

The information on the page is as follows (since the job listings are dynamic, the positions on your screen may vary)

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/7088daaae7cd445caa7ef07043d30647)

Let’s press F12 to enter the developer screen:

Click the mouse-like button next to the element as follows:

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/1787a955b9b74443ac73eddfb4497546)

We can then click on the tag we want from the original page, and the HTML code for that tag will be displayed

For example, click on the job title: Bilingual Commentator and the right hand side will help us find the corresponding source code.

! [](https://p3-tt-ipv6.byteimg.com/large/pgc-image/da52442fd26b4fbdb7c890e974b00336)

We then analyzed the code above and below and found that the code for all positions was in

  • .
  • ! [](https://p6-tt-ipv6.byteimg.com/large/pgc-image/3e1743c7d82c4723a0a88ed6fb5b758e)

    So we can find the last TAB that contains these job codes, i.e

    ! [](https://p26-tt.byteimg.com/large/pgc-image/a1c5c869d38b42dbaf32ce942e73db31)

    Find (“ul”, class_=”sojob-list”).find_all(“li”)

    So we’re positioned under this list of jobs, and all of the following operations are queried from there, and we’re going through each of them in a loop

  • Under the child label

    The find method allows the web page parser to search until it finds the corresponding tag and stops

    Under the

    Under that tag is what we’re going to crawl

    ! [](https://p6-tt-ipv6.byteimg.com/large/pgc-image/9a7b7ee2f1db4ce2a3d4ea4cab14545a)

    Name = date.find(“a”, target=”_blank”).text.strip()

    Then open the <p class=”condition clearfix> TAB and climb to the region, salary, corresponding website, degree

    ! [](https://p9-tt-ipv6.byteimg.com/large/pgc-image/80ffd393519b45d2a58d6c91e8a75661)

    so

    Area = date.find(“a”, class_=”area”).text

    Find (“span”, class_=”text-warning”).text**

    Crawl site realize * * : url = date. The find (” a “, class_ = “area”) “href” * *

    Crawl education realize * * : edu = date. The find (” span “, class_ = “edu”). The text * *

    Finally, we use a loop to change the url of the site, that is, the number at the end of the site is the page number, as follows:

    ! [](https://p1.pstatp.com/origin/pgc-image/896710fa521f408fbe1e854ee72330a1)

    Finally, two more lines of command are used to save the result as a CSV file

    Climb over!!

    View the results:

    ! [](https://p9-tt-ipv6.byteimg.com/large/pgc-image/6f333a3e37a3494599c850df3ec1f559)

    Attach the complete code:

    import requests

    import bs4

    import pandas as pd

    Result = {” jobName “: [], # jobname

    “Area “: [], # area

    Salary: [], # salary

    “Url “: [], # url

    “Edu” : [] # degree

    }

    for i in range(11):

    Url = “www.liepin.com/zhaopin/?co…” + str(i)

    print(url)

    r = requests.get(url)

    html = bs4.BeautifulSoup(r.text, “html.parser”)

    all_job = html.find(“ul”, class_=”sojob-list”).find_all(“li”)

    for date in all_job:

    name = date.find(“a”, target=”_blank”).text.strip()

    area = date.find(“a”, class_=”area”).text

    salary = date.find(“span”, class_=”text-warning”).text

    url = date.find(“a”, class_=”area”)[“href”]

    edu = date.find(“span”, class_=”edu”).text

    result[“jobname”].append(name)

    result[“area”].append(area)

    result[“salary”].append(salary)

    result[“url”].append(url)

    result[“edu”].append(edu)

    df = pd.DataFrame(result)

    df.to_csv(“shenzhen_Zhaopin.csv”, encoding=”utf_8_sig”)

    PS: If you can’t solve the problem, you can click the link below to get it by yourself

    Free Python learning materials and group communication solutions click to join