Main reference
If you want to know the rules of video parsing, you can read the following blog post, but the example is in Python, and the method of getting the id of the video has changed, so it can’t parse the video correctly.
Refer to the blog
Video resolution
First, read the HTML content and obtain the videoId
Here we read the contents of the HTML, such as:
Toutiao.com/group/66310…
By looking at the source code of the website, we can see that videoId is included in Js
How do we get the videoId value? Here we match the videoId in the page by regular matching
Pattern pattern = Pattern.compile("videoId: '(.+)'");
Matcher matcher = pattern.matcher(response);
if(matcher.find()) { String videoId = matcher.group(1); . }Copy the code
Remember videoId: there is a space behind it. If there is less space, there will be no match. I was depressed for a long time because of this, and finally found that there was a space missing.
2. Construct r and S parameters
The r parameter is a random number, which can be any number of digits. Here we generate a 16-bit random number:
String r = getRandom(); 7805700526977788 // Generate a 16-bit random number private StringgetRandom() {
Random random = new Random();
StringBuilder result = new StringBuilder();
for (int i = 0; i < 16; i++) {
result.append(random.nextInt(10));
}
return result.toString();
}
Copy the code
The parameter s is encrypted by CRC32.
/video/urls/v/1/toutiao/mp4/videoid? R = random number
As in the above example, videoId v02004040000bg31ot72gddgigkg7kvg
R is 7805700526977788
Then the encrypted text reads:
/video/urls/v/1/toutiao/mp4/v02004040000bg31ot72gddgigkg7kvg? r=7805700526977788
The generated code for parameter S is as follows:
CRC32 crc32 = new CRC32(); String s = String.format(ApiConstant.URL_VIDEO, videoId, r); // Perform CRC32 encryption. crc32.update(s.getBytes()); String crcString = crc32.getValue() +""; / / 38456043Copy the code
public static final String URL_VIDEO="/video/urls/v/1/toutiao/mp4/%s? r=%s";
Copy the code
Initiate a request to obtain the video address
With the above videoId and the r and S parameters, we can make a request to get the real address of the video as follows:
I.snssdk.com/video/urls/…
The above example builds the following link:
I.snssdk.com/video/urls/…
The requested JSON is shown below:
Here, there is a video_list node, which contains video_1, video_2, and video_3. Here, there is only one video_1. The main_url in the video node is the real address of the video, but it is encrypted by Base64. Here we need to decrypt it:
private String getRealPath(String base64) {
return new String(Base64.decode(base64.getBytes(), Base64.DEFAULT));
}
Copy the code
After decryption, the real address of the video is obtained as follows:
V6-tt.ixigua.com/video/m/220…
So far, the analysis of toutiao method has been analyzed, if you want to view the source code, you can refer to my project (at the end of the article), but also hope that if help you, please help me star, thank you.
The specific code inside parses the part of the video, the VideoPathDecoder class’s decodePath method:
The source address
Copy today’s headlines