A, analysis of the need to climb the site

(1) Open the official King of Glory wallpaper website

  • Website address: pvp.qq.com/web201605/w…

② shortcut key F12, call out the console for packet capture

③ Find the right link and analyze it

④ Check the returned data format

(5) Parse url links

⑥, check whether the URL content is the required image, found that it is actually a thumbnail

⑦, then go to the analysis of the website, randomly click on a wallpaper, view the specified format of the link

⑧, find the destination address

⑨ analyze the differences between target links and thumbnail links

  • Thumbnail: shp.qpic.cn/ishow/27350…
  • Target map: shp.qpic.cn/ishow/27350…
  • As you can see, replacing 200 after the thumbnail address of the specified format with 0 is the actual image of the target

Second, crawler code

① At this point, the analysis of crawler is completed, and the complete code of crawler is as follows

#! /usr/bin/env python# encoding: utf-8'''#------------------------------------------------------------------- # CONFIDENTIAL --- CUSTOM STUDIOS# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - # # @ Project Name: king glory download wallpaper # # @ the File Name: mengy7762 ## @Programmer : Felix## @Start Date : 2021/9/8 14:42 ## @Last Update : 2021/9/8 14:42##-------------------------------------------------------------------''' import os, time, requests, json, refrom retrying import retryfrom urllib import parseclass HonorOfKings: ''' This is a main Class, the file contains all documents. One document contains paragraphs that have several sentences It loads the original file  and converts the original file to new content Then the new content will be saved by this class ''' def __init__(self, save_path='./heros'): self.save_path = save_path self.time = str(time.time).split('.') self.url = 'https://apps.game.qq.com/cgi-bin/ams/module/ishow/V1.0/query/workList_inc.cgi?activityId=2735&sVerifyCode=ABCD&sDataTyp e=JSON&iListNum=20&totalpage=0& page={}&iOrder=0&iSortNumClose=1&iAMSActivityId=51991&_everyRead=true&iTypeId=2&iFlowId=267733&iActId=2735&iModuleId=273 5&_=%s' % self.time[0] def hello(self): ''' This is a welcome speech :return: Self "print("*" * 50) print("* 18 +") print("* 5 + ") 2021-09-08 13:14') print("*" * 50) return self def run(self): "The program entry" print('↓' * 20 + ') '+ '↓' * 20) print('1' 2.1024x768 3.1280x720 4.1280x1024 5.1440x900 6.1920x1080 7.1920x1200 8.1920x1440') size = input(' Please enter the format number you want to download, default 6: ') size = size if size and int(size) in [1,2,3,4,5,6,7,8] else 6 print('-- ') page = 0 offset = 0 total_response = self.request(self.url.format(page)) .text total_res = json.loads(total_response) Total_page = - int (total_res [' iTotalPages']) print (' - a total of {} page... ' . format(total_page)) while True: if offset > total_page: break url = self.url.format(offset) response = self.request(url) .text result = json.loads(response) now = 0 for item in  result["List"]: Now + = 1 hero_name = the parse. Unquote (item [' sProdName ']). The split (' - ') [0] hero_name = re. The sub (r '[【 】 : < > | · @ # $% ^ & ()]', ' ', Hero_name) print('-- download page {} hero progress {}/{}... ' . format(offset, hero_name, now, len(result["List"]))) hero_url = parse.unquote(item['sProdImgNo_{}'.format(str(size))]) save_path = self.save_path + '/'  + hero_name save_name = save_path + '/' + hero_url.split('/')[-2] if not os.path.exists(save_path): os.makedirs(save_path) if not os.path.exists(save_name): with open(save_name, 'wb') as f: response_content = self.request(hero_url.replace("/200", "/0").content f.content (response_content) offset += 1 print('--... ') @retry(stop_max_attempt_number=3) def request(self, url): ''' Send a request :param url: the url of request :param timeout: the time of request : return: the result of request ''' response = requests.get(url, timeout=10) assert response.status_code == 200 return responseif __name__ == "__main__": HonorOfKings.hello.runCopy the code

②, detailed analysis of links

  • In fact, the front end is sending jSONP requests, which are difficult to process in Python because they are not in the standard JSON format
  • Callback (jsonP) {callback (1710418919222, jsonP);
  • So I removed this parameter from my Python code
  • This link has many parameters, in fact I think many can be deleted, but I am too lazy to try slowly
  • IListNum =20 &totalPage =0&page={}
  • Totalpage is not useful. The page is found to start from 0. Note that the following code needs to reduce the total number of pages by 1
self.url = 'https://apps.game.qq.com/cgi-bin/ams/module/ishow/V1.0/query/workList_inc.cgi?activityId=2735&sVerifyCode=ABCD&sDataTyp e=JSON&iListNum=20&totalpage=0&page={}&iOrder=0&iSortNumClose=1&iAMSActivityId=51991&_everyRead=true&iTypeId=2&iFlowId=2 67733&iActId=2735&iModuleId=2735&_=%s' % self.time[0]Copy the code

③ Format selection

  • When you start running, let you choose the number of the format you want to download, why 8 formats, look at the original page to know, 8 different resolutions
  • Look at the picture above, the thumbnail links are 1-8, corresponding to 8 resolution thumbnails, so the original image must also be 8
  • I’m going to default to 1920 by 1080, which is fine for most computers
  • One of the original images, try it yourself, is actually a thumbnail, so choose 2-8 for general download
Print ('↓' * 20 + 'format:' + '↓' * 20) print('1 '. 2.1024x768 3.1280x720 4.1280x1024 5.1440x900 6.1920x1080 7.1920x1200 8.1920x1440') size = input(' Please enter the format number you want to download, default 6: ') size = size if size and int(size) in [1,2,3,4,5,6,7,8] else 6Copy the code

Download code analysis

  • The first request is mainly to get the total number of pages, but the request starts at 0 for the first page, so you need to subtract 1
  • While True is a loop to request the address from 0, first find the thumbnail address, and then replace the thumbnail address link 200 with 0
  • If there are special characters in the name, use the re to remove them. Otherwise, the path lookup may be affected
Print ('-- Download start... ') page = 0offset = 0total_response = self.request(self.url.format(page)) .texttotal_res = json.loads(total_response) Total_page = - int (total_res [' iTotalPages']) print (' - a total of {} page... ' . format(total_page) )while True: if offset > total_page: break url = self.url.format(offset) response = self.request(url).text result = json.loads(response) now = 0 for item in result["List"]: Now + = 1 hero_name = the parse. Unquote (item [' sProdName ']). The split (' - ') [0] hero_name = re. The sub (r '[【 】 : < > | · @ # $% ^ & ()]', ' ', Hero_name) print('-- download page {} hero progress {}/{}... ' . format(offset, hero_name, now, len(result["List"]))) hero_url = parse.unquote(item['sProdImgNo_{}'.format(str(size))]) save_path = self.save_path + '/'  + hero_name save_name = save_path + '/' + hero_url.split('/')[-2] if not os.path.exists(save_path): os.makedirs(save_path) if not os.path.exists(save_name): with open(save_name, 'wb') as f: response_content = self.request (hero_url.replace("/200", "/0").content f.content (response_content) offset += 1 print('--... ')Copy the code

⑤, the results of crawler operation, the same name in the same folder

-END- Material video tutorial to receive: 1, like + comment (check "forward at the same time") 2, pay attention to small series. And private letter reply keyword [information] (must be private letter oh ~ click my avatar can see the private letter button) And private message reply keyword [information] (must be private message oh ~ click my avatar can see the private message button)Copy the code