preface
The text and pictures in this article come from the network, only for learning, communication, do not have any commercial purposes, if you have any questions, please contact us to deal with.
PS: If you need Python learning materials, please click on the link below to obtain them
Free Python learning materials and group communication solutions click to join
The development tools
-
Python 3.6.5
-
Pycharm
import requests import parsel import csv import time 1234
Related modules can be installed using the PIP command
Web data analysis
As shown in the figure, these are all the data to be captured today
Open the developer tools
We can look at the data returned by the webpage, copy the data and search in Response to see whether there is needed data among the data returned by the webpage requested.
import requests # pip install requests url = 'https://www.zhipin.com/c100010000/?query=python&page=1&ka=page-1' headers Response = requests. Get (url=url, params=params, headers=headers) print(response.text) 12345Copy the code
Second, analyze web data structure
Select the Elements selection arrow in the developer tool to select data from the web page, and it will automatically jump to the page TAB, telling you where the data is in the page TAB.
As shown in the figure above, the recruitment data of each enterprise is contained in the LI label. We only need to extract the required data through data analysis.
import parsel # pip install parsel response.encoding = response.apparent_encoding selector = Parsel.selector (response.text) lis = Selector. CSS ('#main. Dit = {} # create a dictionary to accept data for li in lis: Title = li. CSS ('. The job - the name a: : attr (title) '). The get () # title dit [' title '] = title area = li. The CSS (' job - area: : text '). The get () # region Dit [' region '] = area xz_info = li.css('.red::text').get() # salary dit[' salary '] = xz_info xl_list = li.css('.job-limit ' P: : text '). The getall () # degree experience xl_str = '|'. Join (xl_list) dit [' academic experience] = xl_str js_list = li. The CSS (' tags span: : text '). The getall () # skill js_str = '|'. Join (js_list) dit [' skill '] = js_str company = li. The CSS (' the company - text. Name a: : attr (title) '). The get () # Company name dit [' company name '] = company gz_info = li. The CSS (' company - text p: : text '). The getall () # work types gz_str = '|'. Join (gz_info) Dit [' job type '] = gz_str fl_info = li. The CSS (' info - desc: : text '). The get () # welfare dit [' welfare '] = fl_info 1234567891011121314151617181920212223242526272829Copy the code
3. Data preservation
CSV ', mode='a', encoding=' UTF-8-sig ', encoding=' utF-8-sig ', encoding=' utF-8-sig ', Csv_writer = CSV.DictWriter(f, fieldNames =[' Title ', 'region ',' Salary ', 'educational background ',' Skill Requirement ', 'company name ',' Job type ', Csv_writer.writerow (dit) # write to 1234567891011Copy the code
Four, multi-page crawl
'''
https://www.zhipin.com/c100010000/?query=python&page=1&ka=page-1
https://www.zhipin.com/c100010000/?query=python&page=2&ka=page-2
https://www.zhipin.com/c100010000/?query=python&page=3&ka=page-3
'''
12345
Copy the code
Each page is changed by changing the page parameter
for page in range(1, 10):
url = 'https://www.zhipin.com/c100010000/?query=python&page={}&ka=page-{}'.format(page)
12
Copy the code
This can achieve the effect of page crawling!
Implementation effect
That’s all for this article
If you like it, you can like it
Have don’t understand the place also can private letter I or comment