This crawler is for the exchange of experience, if the reader needs to reprint, please indicate the source and link
Hope: like the reader of the blogger, can point to pay attention ~, more wonderful content please collect this column, irregularly add dry goods.
Code: If you subscribe to this column, email me directly and I’ll send you the full code. If you don’t have a subscription, but you want to get the source code for research readers, you can directly add my VX (weixin number: Guprogram)
- preface
- The night is so beautiful that the reptile is not so dangerous
- Use other people’s UA
- The crawler process
- Analysis of web page
- Gets the ID value of each hero
- Analyze the original painting web page
- conclusion
preface
Learning PY also has a lot of time, always forget to write blog, I am also very helpless! I don’t even live up to my nickname, as a crazy coder.
Seeing so many big guys in CSDN, for decades without stopping, also gave me a lot of inspiration. I want to be your role model! I think I’m proud.
As like reading, I also like to play games, before to see someone climb the king of glory of the skin, BUT king of glory of the old players, so I put the league of Heroes to climb.
Hahaha, what a surprise!
In the process of this crawler tutorial, I will also share with you some simple and practical crawler tips.
The night is so beautiful that the reptile is not so dangerous
Don’t rush when you’re crawling. The server can’t handle it…
You have to learn to stop, control a little bit, sleep when you need to.
The lowest level of restriction is when people are sleeping, and crawl as late as you can. You haven’t seen Los Angeles at 4 am, but you can still see crawlers at 4 am.
So that your IP address is not easy to be blocked.
Use other people’s UA
If you’re looking at the robots.txt of someone else’s site, you’ll see their statements about what they can and can’t crawl. However, don’t ignore someone else’s statement, want to give search engines to crawl, such as the following
When you construct headers in Python, the user-agent will specify what the robots define. For example: Baidu’S UA, Google’s UA, Sogou’s UA, etc. You climb to see again, that call a friendly ah.
The crawler process
Analysis of web page
In Developer mode F12, you will see the arrow pointing to the file, if you don’t see it, refresh it.
url0 = 'https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js'
try:
response = requests.get(url0, headers=headers)
response.raise_for_status()
response.encoding = response.apparent_encoding # Set encoding format
hreolist = response.json() # Convert Response to JSON print(hreolist) Print the list of heroes print(len(hreolist['hero'])) # Print hero number :151 except Exception as e: print(e) Copy the code
Using the above code, I managed to get all the heroes and the total number of heroes.
This is just a partial print
{'hero': [{'heroId': '1', 'name' : 'the daughter of darkness', 'alias' :' Annie ', 'title', 'Annie', 'roles' : [' mage],' isWeekFree ':'0', 'attack': '2', 'defense': '3', 'magic': '10', 'difficulty': '6', 'selectAudio': 'https://game.gtimg.cn/images/lol/act/img/vo/choose/1.ogg', 'banAudio': 'https://game.gtimg.cn/images/lol/act/img/vo/ban/1.ogg', 'isARAMweekfree': '0', 'ispermanentweekfree': '0', 'changeLabel': 'no change ', 'goldPrice':'4800', 'couponPrice': '2000', 'camp' : ' ', 'campId' : ' ', 'keywords' :' Anne, the daughter of dark and fire female, Annie, anni, heianzhinv, huonv, the an, hazn, hn '}Copy the code
From the JSON information above, you can see that the list of heroes is written under hero.
Gets the ID value of each hero
Looking at the JSON values you just obtained, you can see that there is a key inside the values: ‘heroId’. What does this ‘heroId’ do?
This I do not know at the beginning, then I entered the original skin painting website, immediately suddenly cheerful
https:/ / lol.qq.com/data/info-defail.shtml?id=1 Annie
https:/ / lol.qq.com/data/info-defail.shtml?id=2 Olaf
https:/ / lol.qq.com/data/info-defail.shtml?id=876 lily
Copy the code
Looking at the three urls above, you can see that heroId is a query parameter ID.
But there is a hole, as you can see, where the number of heroes is only 151 and the ID is 876. Yes, in the first 100 heroes will not have any problems very regular, but more than 100 after the problem, the ID value of each hero jump a lot, so to enter the original painting of each hero to climb the picture must be the correct URL. Getting the ID value of each hero becomes an essential step.
url = 'https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js'
hero_list_json = hreolist
hero_lists = hero_list_json['hero'] # Get the list of heroes
heros_id = list(map(lambda x: x['heroId'], hero_lists)) # Get hero number
Copy the code
Analyze the original painting web page
Open developer mode and you’ll find a file
In the figure above, you can see that skins have 10 values. If you click on the first one, you can see loadingImg. The value of this key is the URL of the original skins.
Of course, as an old player, Lilia only has two skins, but why there are 10 values in skins, click on the third to the tenth, the rest of the loadingImg values are empty.
url_list = [] # Save each hero information url address
for hero_id in heros_id:
url = 'https://game.gtimg.cn/images/lol/act/img/js/hero/{}.js'.format(hero_id)
# print(url)
url_list.append(url)
Copy the code
url1 = 'https://game.gtimg.cn/images/lol/act/img/js/hero/876.js'
try:
response = requests.get(url1, headers=headers)
response.raise_for_status()
response.encoding = response.apparent_encoding # Set encoding format
hreo_info = response.json() skins = hreo_info['skins'] # Get hero skin info # Traversal loadingImg for each skin with the skin name for skin in skins: print(skin['loadingImg']) print(skin['name']) except Exception as e: print(e) Copy the code
Through the above two groups of code ideas, so can already achieve a hero’s skin original painting of the climb, need to get all the skin of the original painting, nothing more than a cycle.
When you can climb the original painting of the first hero, are you afraid of not getting the other heroes?
conclusion
The idea of climbing the original hero painting of league of Legends has been shared with everyone.
May I ask dear readers, can you take down all the hero skin of king of Glory?
I believe you are absolutely no problem, come on!