preface

Graduates looking for a job, incumbents looking for a new job, resigned personnel looking for a job…… No matter what kind of people, to apply for a job, you need to first analyze the corresponding recruitment position, whether the job demand matches you, the common recruitment platform is: BOSS hired straight, retractor recruitment, zhaopin, etc., our usual method is, open the recruitment website, search keyword position, and then turn the pages one by one to see, feel good jobs, submit a resume or chat, head of recruitment, so is there any way that can make a list of job related once come out, convenient and rapid analysis, Of course there is…

What do I want to do

I am also considering new job opportunities recently, so I did it for convenience; I’m going to show you something. I’m going to open the link at the back. BOSS direct hired 100 PHP jobs

As you can see, this is in the form of a table showing 100 PHP job openings. Yes, this is the PHP job openings I crawled on BOSS Zhipin. Why 100? Finally, it is shown in youdao Sharing, which is the one you see above.

Runtime environment

PIP instatll XXXXXX PIP install requests PIP instatll XXXXXX PIP install requests PIP instatll XXXXXX PIP install requests

Climb Boss direct hire data

Here, it is not recommended that you use your OWN IP to climb the BOSS zhipin data, because minutes will enter the small black room, so, here, we go to the proxy IP, about the proxy IP, I have said in the last article, you can look back if you do not understand; Also in the header header cookie value is mandatory, you can refresh the BOSS zhipin website in the browser, and then open F12 Network to find, copy over can be used, and need to replace, do not always use the same cookie to crawl all the data, try to understand…

def get_url_html(self, url, cookie):
    """ Request page HTML """
    ip_url = self.proxies_ip + ':' + str(self.proxies_port)
    proxies = {'http': 'http://' + ip_url, 'https': 'https://' + ip_url}
    header = {
        'User-Agent': 'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'.'cookie': cookie
    }
    request = requests.get(url=url, headers=header, proxies=proxies, timeout=3)
    html = False
    if request.status_code == 200:
        html = request.content
    return html
Copy the code

Complete source code

As usual, I have uploaded the code to GitHub (GitHub source address), but, as a zealous tile removal, in order to facilitate some people want to be lazy, not directly to the dating website to view, I also posted the source here, if there is any problem, it is best to go to the dating website to find me, please pick up the code……

#! /usr/bin/env python
# -*- coding: utf-8 -*-

[""] [Requests + BS4] author: Gxcuizy Date: 2020-06-18 ""

import requests
from bs4 import BeautifulSoup


class GetBossData(object):
    """ Climb 10 pages of Boss direct Hire job data """
    domain = 'https://www.zhipin.com'
    base_url = 'https://www.zhipin.com/c101280600/?query='
    position = ' '
    # Proxy IP address
    proxies_ip = '58.220.95.30'
    proxies_port = '10174'

    def __init__(self, position):
        self.position = position

    def get_url_html(self, url, cookie):
        """ Request page HTML """
        ip_url = self.proxies_ip + ':' + str(self.proxies_port)
        proxies = {'http': 'http://' + ip_url, 'https': 'https://' + ip_url}
        header = {
            'User-Agent': 'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'.'cookie': cookie
        }
        request = requests.get(url=url, headers=header, proxies=proxies, timeout=3)
        html = False
        if request.status_code == 200:
            html = request.content
        return html

    def run(self):
        """ Execution entry """
        page_list = range(1.11)
        Open file, ready to write
        dict_file = open('job.md'.'a', encoding='UTF-8')
        # Empty the file contents
        dict_file.seek(0)
        dict_file.truncate()
        dict_file.write('the fixed number of year of the salary | | | | position area information information link | | | | company name company')
        dict_file.write('\n| --- | --- | --- | --- | --- | --- | --- |')
        # page crawl data
        for page in page_list:
            print('Start crawling to number one' + str(page) + 'page data')
            boss_url = self.base_url + str(self.position) + '&page=' + str(page) + '&ka=page-' + str(page)
            # F12 Open debug mode, manually refresh the page to get cookies, then replace
            if page < 4:
                cookie_val = 'lastCity=101280600; __zp_seo_uuid__=d59649f5-bc8a-4263-b4e1-d5fb1526ebbe; __c=1592469667; __g=-; Hm_lvt_194df3105ad7148dcf2b98a91b5e727a=1592469673; __l=l=%2Fwww.zhipin.com%2Fshenzhen%2F&r=https%3A%2F%2Fwww.google.com%2F&friend_source=0&friend_source=0; toUrl=https%3A%2F%2Fwww.zhipin.com%2F%2Fjob_detail%2F3f35305467e161991nJ429i4GA%7E%7E.html; __a = 43955211.1592469667.. 1592469667.39.1.39.39; Hm_lpvt_194df3105ad7148dcf2b98a91b5e727a=1592530438; __zp_stoken__=7f3aaPCVBFktLe0xkP21%2BJSFCLWILSwx7NEw4bVJkRx8pdBE3JGNmWjVwdx5PXC8rHmN%2BJB0hX1UvTz5VPyMmOhIVHBglVzoxJQIdL QtKR3ZFBFIeazwOByVndHwXBAN%2FXFo7W2BffFxtXSU%3D; __zp_sseed__=Ykg0aQ3ow1dZqyi9KmeVnWrqZXcZ32a4psiagwqme3M=; __zp_sname__=93bf4835; __zp_sts__=1592530479301'
            elif page < 7:
                cookie_val = 'lastCity=101280600; __zp_seo_uuid__=d59649f5-bc8a-4263-b4e1-d5fb1526ebbe; __c=1592469667; __g=-; Hm_lvt_194df3105ad7148dcf2b98a91b5e727a=1592469673; __l=l=%2Fwww.zhipin.com%2Fshenzhen%2F&r=https%3A%2F%2Fwww.google.com%2F&friend_source=0&friend_source=0; toUrl=https%3A%2F%2Fwww.zhipin.com%2F%2Fjob_detail%2F3f35305467e161991nJ429i4GA%7E%7E.html; __a = 43955211.1592469667.. 1592469667.39.1.39.39; Hm_lpvt_194df3105ad7148dcf2b98a91b5e727a=1592530438; __zp_stoken__=7f3aaPCVBFktLe0xkP21%2BJSFCLWILSwx7NEw4bVJkRx8pdBE3JGNmWjVwdx5PXC8rHmN%2BJB0hX1UvTz5VPyMmOhIVHBglVzoxJQIdL QtKR3ZFBFIeazwOByVndHwXBAN%2FXFo7W2BffFxtXSU%3D; __zp_sseed__=Ykg0aQ3ow1dZqyi9KmeVnWrqZXcZ32a4psiagwqme3M=; __zp_sname__=93bf4835; __zp_sts__=1592530514188'
            elif page < 10:
                cookie_val = 'lastCity=101280600; __zp_seo_uuid__=d59649f5-bc8a-4263-b4e1-d5fb1526ebbe; __c=1592469667; __g=-; Hm_lvt_194df3105ad7148dcf2b98a91b5e727a=1592469673; __l=l=%2Fwww.zhipin.com%2Fshenzhen%2F&r=https%3A%2F%2Fwww.google.com%2F&friend_source=0&friend_source=0; toUrl=https%3A%2F%2Fwww.zhipin.com%2F%2Fjob_detail%2F3f35305467e161991nJ429i4GA%7E%7E.html; __a = 43955211.1592469667.. 1592469667.40.1.40.40; Hm_lpvt_194df3105ad7148dcf2b98a91b5e727a=1592530479; __zp_stoken__=7f3aaPCVBFktLCT4uVVV%2BJSFCLWIVPWZyNUk4bVJkR25XXHVeZWNmWjVwd286Sm83HmN%2BJB0hX1UvBiBVRyt9IWQOcRtWSk83fAsfJ AtKR3ZFBE5efUl%2FByVndHwXRQN%2FXFo7W2BffFxtXSU%3D; __zp_sseed__=Ykg0aQ3ow1dZqyi9KmeVnd/9vyiSRHrJFoMai+azsb8=; __zp_sname__=93bf4835; __zp_sts__=1592530496863'
            else:
                cookie_val = 'lastCity=101280600; __zp_seo_uuid__=d59649f5-bc8a-4263-b4e1-d5fb1526ebbe; __c=1592469667; __g=-; Hm_lvt_194df3105ad7148dcf2b98a91b5e727a=1592469673; __l=l=%2Fwww.zhipin.com%2Fshenzhen%2F&r=https%3A%2F%2Fwww.google.com%2F&friend_source=0&friend_source=0; toUrl=https%3A%2F%2Fwww.zhipin.com%2F%2Fjob_detail%2F3f35305467e161991nJ429i4GA%7E%7E.html; __a = 43955211.1592469667.. 1592469667.41.1.41.41; __zp_stoken__=7f3aaPCVBFktLc1t4VTp%2BJSFCLWJscnlxSgw4bVJkRw9tLB4pb2NmWjVwdwwgc2l7HmN%2BJB0hX1UvGFZVTH0OdhQQfwxfOyoieW8cO gtKR3ZFBAJYRFMcByVndHwXTwN%2FXFo7W2BffFxtXSU%3D; Hm_lpvt_194df3105ad7148dcf2b98a91b5e727a=1592530497; __zp_sseed__=Ykg0aQ3ow1dZqyi9KmeVnSZKsrhFUU/CYntJcRoFki4=; __zp_sname__=93bf4835; __zp_sts__=1592530514188'
            html = self.get_url_html(boss_url, cookie_val)
            soup = BeautifulSoup(html, 'html.parser')
            # Job listings
            job_list = soup.select('.job-list ul li')
            for job_li in job_list:
                # Single job information
                url = self.domain + job_li.select('.job-title a') [0].attrs['href']
                title = job_li.select('.job-title a') [0].get_text()
                area = job_li.select('.job-title .job-area') [0].get_text()
                salary = job_li.select('.job-limit .red') [0].get_text()
                year = job_li.select('.job-limit p') [0].get_text()
                company = job_li.select('.info-company h3') [0].get_text()
                industry = job_li.select('.info-company p') [0].get_text()
                info = {
                    'title': title,
                    'area': area,
                    'salary': salary,
                    'year': year,
                    'company': company,
                    'industry': industry,
                    'url': url
                }
                print(info)
                # Write position information
                info_demo = '\n| %s | %s | %s | %s | %s | %s | %s |'
                dict_file.write(info_demo % (title, area, salary, year, company, industry, url))
        dict_file.close()


# Program main entry
if __name__ == '__main__':
    # instantiation
    job_name = input('Please enter the job keyword:').strip()
    if job_name == ' ':
        print('Keyword is empty, please try again')
        exit(0)
    gl = GetBossData(job_name)
    # execute script
    gl.run()
Copy the code

The last

If you have any questions, please leave a message to me, we can learn from each other…

I hope all the young partners who are looking for jobs can immediately receive satisfactory Offer, more pay less live that kind!

Oh right, online job hunting, I am shooting (P) reed (H) film (P) in Shenzhen, if you think there is suitable for me to work, you can recommend to me, thank you very much.