Never get a reply? HR doesn't even give you a chance to perform? Python crawler: Resume Template Collection

Resume Template Download

- Tools to prepare
- Project idea analysis
- Easy source sharing

Tools to prepare

Development environment: Win10, PYTHon3.7 development tools: PyCharm, Chrome

Project idea analysis

Find the hyperlink to the details page and the name of your resume

Extract parameter information

When using xpath syntax, note that the source code of the web page may differ from the page rendered by the browser page, and that data must be extracted from the source code

    html_data = etree.HTML(page) 
    a_list = html_data.xpath("//div[@class='box col3 ws_block']/a")  
    for a in a_list:
        resume_href = 'https:' + a.xpath('./@href')[0]  
        resume_name = a.xpath('./img/@alt')[0]  
Copy the code

Enter the Details page

Find the address of the corresponding details page

Extract the download address of the rar

        resume_tree = etree.HTML(resume_page)  
        resume_link = resume_tree.xpath('//ul[@class="clearfix"]/a/@href')[0]
Copy the code

Easy source sharing

Import requests from LXML import etree headers = {' user-agent ': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; Rv :86.0) Gecko/20100101 Firefox/86.0',} for I in range(2, 10): Url = f 'https://sc.chinaz.com/jianli/free_ {STR (I)}. HTML' # set corresponding routing response = requests. I get (= url url, headers=headers) html_data = etree.HTML(response.text) a_list = html_data.xpath("//div[@class='box col3 ws_block']/a") for a in a_list: New_url = 'HTTPS :' + a.path ('./@href')[0] name = a.path ('./img/@alt')[0] res = requests. Get (URL =new_url) # resume_tree = etree.HTML(res.text) resume_url = resume_tree.xpath('//ul[@class="clearfix"]/a/@href')[0] result = Get (URL = resume_URL, headers={' user-agent ': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; Rv :86.0) Gecko/20100101 Firefox/86.0'}).content # Obtain binary data path = './moban/' + name + '.rar' with open(path, 'wb') as fp: fp.write(result) # save fileCopy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Never get a reply? HR doesn’t even give you a chance to perform? Python crawler: Resume Template Collection

Resume Template Download

Tools to prepare

Project idea analysis

Easy source sharing

Never get a reply? HR doesn’t even give you a chance to perform? Python crawler: Resume Template Collection

Resume Template Download

Tools to prepare

Project idea analysis

Easy source sharing

Related Posts

How does SQL tuning work in large factories?

C programming program to calculate the number of days in a year

Mysql index