preface

In fact, in the past targeted Youdao translation, but due to the time problem has not been studied (my SAO operation is still behind, remember to pay attention to), this article mainly explains how to use Python to call Youdao translation, explain this crawler and Youdao translation JS “struggle” process!

Of course, this article is only for exchange learning use, suitable for their own do some small things for entertainment, prohibited for commercial use! Please indicate wechat official number: Bigsai. Github address: github.com/javasmall/p…

Analysis of the

For a website, the first thing to do is to analyze, analyze the rules of the web page

Analysis of the url

Enter Youdao Translation and you will find that its URL is unchanged, which means that its requests are asynchronously interacted with via Ajax. Click on F12 and you can easily find the interactive request in XHR. Click on the information and you will find a list of parameters, some of which are encrypted, such as salt. First there’s the number.

Analysis Parameter 01

Take a wild guess: this key parameter is definitely in the same place. We search for salt, then click on it as normal, format expand, and search for salt again in JS. Want to find the relevant salt near to see if you can find breakpoints for debugging! Eventually, of course, you’ll find 11 things that you can debug breakpoints around. You are so happy to find relevant locations for key encryption fields and functions.

Analysis parameter 02

This time, let’s use the browser call stack function to look up the js execution stack. Directly click the corresponding module interrupt point to observe. Eventually you’ll find this function at this location generateSaltSign(n), where the main encryption functions are executed

Encryption analysis

In fact, youdao translation encryption is relatively simple, you have a look,

  • I don’t knownavigator.appVersionWhat is it? Let me print it out. isThe browser header is md5 encryptedPhi can be fixed, which means thisbv(t)Parameter it can be fixed.
  • thistsIt’s a 13-digit current timestamp!
  • thissaltIt’s a time stamp followed by a random number in 100, just pick any number.
  • thissignIs it not"Fanyideskweb" + translated string + salt + "n% a-rkat5fb [Gy?;N5@Tj"All these numbers and thenMd5 encryption!

The subsequent analysis found that these parameters did not change. So this time it’s unique, but it has to be within 5000 words, and if it goes beyond 5000 words it will take the first 5000 words, so notice that.

Simulation of the request

Pay attention to the point

Now that we have the above rules, then we can through this part of the rules and capture information integration with Python simulation completed JS events, send requests. There are a few caveats to this.

  • FristlyYou have to do pythonMd5 encryption module.Time moduleAnd can make some equivalent transformations. It’s perfect, py’shashlibandtimeModules can fullfill you. Problem solved.
  • In addition, the data dictionary, the body of the POST request, needs to be URL-encoded to send the request as data.
  • last but not leastThe most important thing after encryptionheaderWe must not criticize the general idea. My experience tells me that if the content – Length is incorrect, an error will be reported, and if it is not filled in, the packet capture analysis system will automatically generate it. So don’t calculate the body length,This parameter must be omitted.An error is reported if cookies are not placedYou put cookies in and you test them and you see that some of them are ok and some of them are optional and some of them have to follow the pattern. whilecookieMust comply withOUTFOX_SEARCH_USER_ID = - 1053218418 - @117.136.67.240namelyThe value is a digit +@+ IP address. Probably for testing purposes. This can be simulated directly.

Request code

The result is a string of JSON, just take it!

import requests
import hashlib
import time
import urllib.parse
# create md5 object
def nmd5(str):
    m = hashlib.md5()
    # Tips
    Encode is required here
    Unicode-objects must be encoded before hashing
    # because the default STR in python3 is unicode
    Encoding ='utf-8' # or b = bytes(STR, encoding='utf-8')
    b = str.encode(encoding='utf-8')
    m.update(b)
    str_md5 = m.hexdigest()
    return  str_md5
def formdata(transtr):
    # Message to be encrypted
    headerstr = '5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
    #print(round(time.time()*1000))
    bv=nmd5(headerstr)
    ts=str(round(time.time()*1000))
    salt=ts+'90'
    strexample='fanyideskweb'+transtr+salt+'n%A-rKaT5fb[Gy?;N5@Tj'
    sign=nmd5(strexample)
    #print(sign)
    i=len(transtr)
    #print(i)
    # print('MD5 = '+ headerstr')
    # print('MD5 '+ bv)
    dict={'i':transtr,'from':'AUTO'.'TO':'AUTO'.'smartresult': 'dict'.'client':'fanyideskweb'.'salt':salt,
          'sign':sign,
          'ts':ts,
          'bv':bv,
          'doctype':'json'.'version':'2.1'.'keyfrom':'fanyi.web'.'action':'FY_BY_REALTlME'
    }
    return dict


url='http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
header={'User-Agent':'the Mozilla / 5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'.'Referer':'http://fanyi.youdao.com/'.'Origin': 'http://fanyi.youdao.com'.'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8'.'X-Requested-With':'XMLHttpRequest'.'Accept':'application/json, text/javascript, */*; Q = 0.01 '.'Accept-Encoding':'gzip, deflate'.'Accept-Language':'zh-CN,zh; Q = 0.9 '.'Connection': 'keep-alive'.'Host': 'fanyi.youdao.com'.'cookie':'_ntes_nnid=937f1c788f1e087cf91d616319dc536a,1564395185984; OUTFOX_SEARCH_USER_ID_NCOO=; OUTFOX_SEARCH_USER_ID = - 10218418 - @11.136.67.24; JSESSIONID=; ___rl__test__cookies=1'
 }
input=input("Please input translation content :")
dict=formdata(input)
dict=urllib.parse.urlencode(dict)
dict=str(dict)
#dict=urllib.parse.urlencode(dict).encode('utf-8')

req=requests.post(url,data=dict,headers=header)
val=req.json()
print(val['translateResult'] [0] [0] ['tgt'])
Copy the code

The execution result

conclusion

In this way, we start from 0 to gracefully unveil the veil of Youdao translation! You can do some interesting things with this (to be continued ——)

Of course, this may not be difficult, for the old bird said very simple (do not spray), but for the novice is particularly suitable for practice, if you feel a problem or do not understand the public can communicate! Of course, there is no telling how long this code will last. So please hurry up and try it! If you feel ok, please give love and praise! Of course, this is just the beginning of my imagination, the fun is still behind!

Github address for project and crawler repository, welcome star and Fork!

Welcome to pay attention to a wave of public number: Bigsai study together, progress together! Share more fun in the long run!