preface
Check the latest anti-crawler way, see a WebSocket handshake verification anti-crawler, have not encountered, find a website to try ~ the latest anti-crawler way: blog.csdn.net/qq_26079939…
What is WebSocket?
WebSocket is a protocol for full duplex communication over a single TCP connection provided by HTML5.
WebSocket makes it easier to exchange data between the client and the server, allowing the server to actively push data to the client. In the WebSocket API, the browser and server only need to complete a handshake to create a persistent connection and two-way data transfer.
In the WebSocket API, the browser and the server only need to do a handshake, and then a fast channel is formed between the browser and the server. Data can be transmitted directly between the two.
WebSocket handshake verification anti-crawler
1. Target sites
Le Yu Sports: live.611.com/zq
2. Website analysis
1. To establish a Socket link address, of which 9394 adf88ece4ff08f9ac6e82949f3a1 parameter is a variable value
2. Obtain the token in the following way
def getToken() :
url = "https://live.611.com/Live/GetToken"
response = requests.get(url)
if response.status_code == 200:
data = json.loads(response.text)
token = data["Data"]
return token
else:
print("Request error")
Copy the code
3. As can be seen from the figure below, the green arrow is the data sent by the client to the server, and the red arrow is the data responded by the server
3. Obtain data
There are many Python libraries for connecting websockets, but easy-to-use, stable ones are websocket-client(non-asynchronous), WebSockets (asynchronous), aiowebSocket (asynchronous), The following uses websocket-client and WebSockets.
Websocket – client method:
import requests
import websocket
import json
import time
def getToken() : Get the token argument
url = "https://live.611.com/Live/GetToken"
response = requests.get(url)
if response.status_code == 200:
data = json.loads(response.text)
token = data["Data"]
return token
else:
print("Request error")
def get_message() : # Data to send
timestamp = int(time.time()) * 1000
info = {'chrome': 'true'.'version': '80.0.3987.122'.'webkit': 'true'}
message1 = {
"command": "RegisterInfo"."action": "Web"."ids": []."UserInfo": {
"Version": str([timestamp]) + json.dumps(info),
"Url": "https://live.611.com/zq"
}
}
message2 = {
"command": "JoinGroup"."action": "SoccerLiveOdd"."ids": []
}
message3 = {
"command": "JoinGroup"."action": "SoccerLive"."ids": []}return json.dumps(message1), json.dumps(message2), json.dumps(message3)
def Download(token,message1,message2,message3) :
uri = "wss://push.611.com:6119/{}".format(token)
ws = websocket.create_connection(uri, timeout=10)
ws.send(message1)
ws.send(message2)
ws.send(message3)
while True:
result = ws.recv()
print(result)
if __name__ == '__main__':
token = getToken() # Get token string
message1, message2, message3 = get_message() Construct request information
Download(token,message1, message2,message3) # fetch data
Copy the code
The results
Web sockets method
import asyncio
import logging
import time,json,requests
from aiowebsocket.converses import AioWebSocket
def getToken() :
url = "https://live.611.com/Live/GetToken"
response = requests.get(url)
if response.status_code == 200:
data = json.loads(response.text)
token = data["Data"]
return token
else:
print("Request error")
def get_message() : # Data to send
timestamp = int(time.time()) * 1000
info = {'chrome': 'true'.'version': '80.0.3987.122'.'webkit': 'true'}
message1 = {
"command": "RegisterInfo"."action": "Web"."ids": []."UserInfo": {
"Version": str([timestamp]) + json.dumps(info),
"Url": "https://live.611.com/zq"
}
}
message2 = {
"command": "JoinGroup"."action": "SoccerLiveOdd"."ids": []
}
message3 = {
"command": "JoinGroup"."action": "SoccerLive"."ids": []}return message1, message2, message3
async def startup() :
token = getToken() # Get token string
uri = "wss://push.611.com:6119/{}".format(token)
message1, message2,message3 = get_message() Construct request information
async with AioWebSocket(uri) as aws:
converse = aws.manipulator
await converse.send(json.dumps(message1))
await converse.send(json.dumps(message2))
await converse.send(json.dumps(message3))
while True:
mes = await converse.receive()
if mes:
msg = json.loads(str(mes, encoding="utf-8"))
print(msg)
if __name__ == '__main__':
try:
asyncio.get_event_loop().run_until_complete(startup())
except KeyboardInterrupt as exc:
logging.info('Quit.')
Copy the code
The results
Third, summary
In the Web world, polling and WebSocket are two methods for implementing ‘real-time’ updates of data. Polling means that the client accesses the server interface at certain intervals (e.g., 1 second) to achieve the effect of ‘real time’. Although the data looks like it is being updated in real time, it is actually being updated at certain intervals and is not really being updated in real time. Polling usually adopts pull mode, in which the client actively pulls data from the server.
WebSocket adopts the push mode, in which the server actively pushes the data to the client, which is the real real-time update.
After the server creates the socket service, it listens to the client and reads the message sent by the client using while True
The handshake request sent by the server is then verified. If the verification succeeds, the response header with status code 101 is returned; otherwise, the response header with status code 403 is returned