This article will use Python socket programming to build a simple Web server from scratch. Of course, it is only a simple Web server. It is still possible to deploy static web pages (such as resumes), but it is still far from a real production server. Let’s take a look at what happens on the server side as the client (browser) interacts with the server:

  1. Create a connection socket when a client (browser) contacts the server.
  2. The server accepts HTTP requests from this connection;
  3. Interpret the request to determine the specific file requested;
  4. Get the requested file from the server’s file system;
  5. Create an HTTP response message consisting of requested files with a header line.
  6. Sending a response to the requesting browser over a TCP connection; Returns if the file does not exist404 Not FoundError message.

1. Basic Web servers

Suppose the file we request from the browser to the server is helloworld.html, and the content of the file is customized (I’m just writing: Great, the server is working!). , we need to put the file in the same directory as the server, and then send a request to the server through the browser, and the server responds according to the above steps. The full code on the server side is as follows:

from socket import * serverSocket = socket(AF_INET, SOCK_STREAM) serverSocket.bind(('', 6789)) serversocket.listen (1) while True: print(' server in place ') connectionSocket, addr = serversocket.accept () try: message = connectionSocket.recv(1024).decode() filename = message.split()[1] f = open(filename[1:], Encoding =' utF-8 ') outputData = f.read() header =' HTTP/1.1 200 OK\nConnection: close\ ncontent-type: text/html\nContent-Length: %d\n\n' % (len(outputdata)+24) connectionSocket.send(header.encode()) for i in range(0, len(outputdata)): connectionSocket.send(outputdata[i].encode()) connectionSocket.send("\r\n".encode()) connectionSocket.close() except IOError: header = 'HTTP/1.1 404 Not Found' connectionsocket.send (header.encode()) connectionsocket.close ()Copy the code

Bind ((6789) “,) specifies the socket and the port number 6789 binding, if the code is run on the local, you can simply type http://localhost:6789/HelloWorld.html in your browser to access the page; If the code is deployed on a cloud server, you need to change the IP address to the server’s public IP address. This way, you can easily deploy static web pages such as profiles on the server.

Listen (1) specifies that the server will only accept one request at a time, and we’ll use multi-threaded coding to process multiple requests at the same time.

In the construction of the header, we specify the Length of the entity (encapsulated TCP packet) through the content-Length, that is, the Length of the data + the Length of the TCP header. Normally, the Length of the TCP header is 20 bytes, but by actually observing the source code of the web page, I found that four bytes are missing. It’s not hard to guess that this is because TCP’s option field takes up four bytes, so the header is 24 bytes long. We don’t even need to specify the message length ourselves, just return the basic HTTP/1.1 200 OK.

As an aside, I found that there are four cases of content-Length:

  1. Not explicitly specifiedContent-Length, front-end page display intact, data integrity;
  2. Explicitly specifyContent-LengthAnd less than the length of the entity, front-end page display is not intact, data loss;
  3. Explicitly specifyContent-LengthAnd equal to the length of the entity, front-end page display intact, data integrity;
  4. Explicitly specifyContent-LengthThe front-end page is not displayed, and the browser console displays an errorERR_CONTENT_LENGTH_MISMATCH;

That is, the worst-case scenario is that the specified length is greater than the length of the entity, and the browser will report an error and the front end will display nothing because of the length mismatch. If the specified length is less than the entity length, the browser fetches only the first part of the message entity, and the front page displays badly. In effect, the length of an entity is the same as that of an entity that is explicitly specified.

2. Multi-threaded Web server

Using the code above, a basic, simple Web server is set up, but it can only handle one request at a time. The specific code is as follows:

from socket import * import threading def tcp_process(connectionSocket): print(threading.current_thread()) try: message = connectionSocket.recv(1024).decode() print(repr(message)) print(message) filename = message.split()[1] f = Open (filename[1:], encoding=' UTF-8 ') OutputData = f.read() header =' HTTP/1.1 200 OK\nConnection: close\ nContent-type: text/html\nContent-Length: %d\n\n' % (len(outputdata)+24) connectionSocket.send(header.encode()) for i in range(0, len(outputdata)): connectionSocket.send(outputdata[i].encode()) connectionSocket.send("\r\n".encode()) connectionSocket.close() except IOError: Header = 'HTTP/1.1 404 Not Found' connectionsocket.send (header.encode()) connectionsocket.close () if __name__ == "__main__": serverSocket = socket(AF_INET, SOCK_STREAM) serverSocket.bind(('', 6789)) serverSocket.listen(10) while True: Print (' server in place ') connectionSocket, addr = serversocket.accept () thread = threading. args=(connectionSocket, )) thread.start()Copy the code

As you can see, the connectionSocket is executed by a specific thread serving a specific customer, and the main process does not have to wait for the user to complete its service before accepting the next user’s request, which greatly improves server efficiency.

3. The client

Finally, let’s look at the code on the client side, through which requests can be made directly to the server without going through the browser.

from socket import * clientSocket = socket(AF_INET, SOCK_STREAM) clientSocket.connect(('localhost', 6789)) while True: Header = 'GET/helloworld.html HTTP/1.1\nHost: localhost:6789\nConnection: keep-alive\ nuser-agent: Mozilla/5.0\n\n' clientsocket.send (header.encode()) message = clientsocket.recv (1024) print(message.decode())Copy the code

When I made a request to the server by running the client code, although the data was successfully retrieved, I also received the following error on the client side:

ConnectionResetError: [WinError 10054] The remote host forced an existing connection to close.

This error generally occurs in the crawler process, because the crawler information is too frequent, and is identified as a malicious attack by the server. Connectionsocket.close () was commented out of the server code, and the error went away. The exact cause of the error is still unknown, so please let me know in the comments section.