This article is participating in Python Theme Month. See the link for details
Work to prepare
In the previous chapter, we introduced how to get better access to video resources, and also introduced how to get the information we want through static access. Here again through the way of dynamic interface to obtain job information!!
The local runtime environment is also based on docker, which builds rough details. You can check out the previous article on portal
The code
Demand analysis
When opening the page of Tencent recruitment information, we opened the page element and found that we did not find the data information we wanted in the page element. After checking the interface, we found that the information was obtained through the interface, so we obtained the information we wanted through the interface this time.
- View the home page to find pages based on keywords
url
Page turning rules. - Look at the information in the paging data interface return value to extract.
- View the interface information on the detail page and extract the returned value.
Write the code
- It is preferred to define function entry to extract key information from corresponding paging information.
for page_num in range(1.2) :print('Start climbing data from page {}'.format(page_num + 1))
# 1. Addresses for each page
url = 'https://careers.tencent.com/tencentcareer/api/post/Query?timestamp=1625731961957&countryId=&cityId=&bgIds=&productId=&c ategoryId=&parentCategoryId=&attrId=&keyword=&pageIndex={}&pageSize=10&language=zh-cn&area=cn'.format(page_num)
# 2. Get [details page URL] for all positions [current page]
detail_urls = get_jo_detail_urls(url)
# 3. Parse the details page data one by one
for detail_url in detail_urls:
position = get_detail_msg(detail_url)
positions.append(position)
time.sleep(1)
Copy the code
- Parses the return value and assembles the detail page data return, and retrieves the information in the return value.
def get_detail_msg(detail_id) :
position = {}
detail_url = 'https://careers.tencent.com/tencentcareer/api/post/ByPostId?timestamp=1625794375072&postId={}&language=zh-cn'.format(detail_id)
response = requests.get(detail_url, headers=HEADERS)
json_obj = json.loads(response.text)
# print(' request detail address is :' + detail_URL)
response = requests.get(detail_url, headers=HEADERS)
Get the job title
position['title'] = json_obj['Data'] ['RecruitPostName']
# [Data] Location/job type
position['location'] = json_obj['Data'] ['LocationName']
position['category'] = json_obj['Data'] ['CategoryName']
# [Data] Job responsibilities
position['duty'] = json_obj['Data'] ['Responsibility']
# [Data] Job requirements
position['ask'] = json_obj['Data'] ['Requirement']
return position
def get_jo_detail_urls(page_url) :
a = set(' ')
response = requests.get(page_url, headers=HEADERS)
json_obj = json.loads(response.text)
for item in json_obj['Data'] ['Posts']:
a.add(item['PostId'])
print(a)
return a
Copy the code
Effect of screenshots
The complete code
import requests
import time
import json
HEADERS = {
'User-Agent': 'the Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'.'Referer': 'https://careers.tencent.com/search.html?keywords=python&lid=0&tid=0&start=1'.'Cookie': 'pgv_pvi=9905274880; _ga = GA1.2.134754307.1606182211; pgv_pvid=3632371128; pgv_info=ssid=s598319774; _gcl_au = 1.1.1062400509.1622338581; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%22100019473226%22%2C%22first_id%22%3A%226ab28e9051a5f99e96cec737ad4367 a7%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest _search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_refe rrer%22%3A%22%22%7D%2C%22%24device_id%22%3A%2217a5f65aa69497-0a4a94eb345f15-34657601-1296000-17a5f65aa6ad9e%22%7D; loading=agree'
}
def get_detail_msg(detail_id) :
position = {}
detail_url = 'https://careers.tencent.com/tencentcareer/api/post/ByPostId?timestamp=1625794375072&postId={}&language=zh-cn'.format(detail_id)
response = requests.get(detail_url, headers=HEADERS)
json_obj = json.loads(response.text)
# print(' request detail address is :' + detail_URL)
response = requests.get(detail_url, headers=HEADERS)
Get the job title
position['title'] = json_obj['Data'] ['RecruitPostName']
# [Data] Location/job type
position['location'] = json_obj['Data'] ['LocationName']
position['category'] = json_obj['Data'] ['CategoryName']
# [Data] Job responsibilities
position['duty'] = json_obj['Data'] ['Responsibility']
# [Data] Job requirements
position['ask'] = json_obj['Data'] ['Requirement']
return position
def get_jo_detail_urls(page_url) :
a = set(' ')
response = requests.get(page_url, headers=HEADERS)
json_obj = json.loads(response.text)
for item in json_obj['Data'] ['Posts']:
a.add(item['PostId'])
return a
def spider() :
# 0. Job data to be returned
positions = []
for page_num in range(1.2) :print('Start climbing data from page {}'.format(page_num + 1))
# 1. Addresses for each page
url = 'https://careers.tencent.com/tencentcareer/api/post/Query?timestamp=1625731961957&countryId=&cityId=&bgIds=&productId=&c ategoryId=&parentCategoryId=&attrId=&keyword=&pageIndex={}&pageSize=10&language=zh-cn&area=cn'.format(page_num)
# 2. Get [details page URL] for all positions [current page]
detail_urls = get_jo_detail_urls(url)
# 3. Parse the details page data one by one
for detail_url in detail_urls:
position = get_detail_msg(detail_url)
positions.append(position)
time.sleep(1)
print(positions)
print('Climb complete! ')
if __name__ == '__main__':
spider()
Copy the code