As web engineers, we deal with ports and sockets every day. Many people may use them, but when asked about their nature, they may not be able to answer them.

In this article, we will explore the nature of ports and sockets.

port

Our network is layered, OSI has 7 layers, TCP/IP is simplified to 5 or 4 layers.

The network layer is mainly IP protocol, which is the router related protocol, and its function is to transfer data from one host to another host.

What about when you get to the other console? Each host has many processes, how to know which process to hand? This is what TRANSPORT layer TCP and UDP do.

How do you locate a host process?

Can I just specify the process ID? X: indicates the process ID, for example.

This design is ok, but the process ID is dynamic and not fixed. The next time you restart a service process, the process ID may change. So I have to keep thinking about it.

What about adding a middle layer? Computer is not so problems can be added to the middle layer to solve it. Instead of giving the data directly to the process, the data is placed in a certain memory, called a port, on which the process listens.

Instead of fixing the process ID, the process can bind to the memory (port) and listen to its changes.

The idea that instead of relying directly on the concrete implementation, both sides rely on a layer of abstraction is called IOC.

Why are they called ports? Because hardware also has the concept of ports, as shown in figure:

The port of hardware is the entrance of communication between the device and the outside world, and the port of software is also positioned in the same way, so the name of the port is adopted.

In this way, we need IP + port + protocol to locate a process on the network, which is the three elements of the process network address. It can be seen that TCP, IP and other protocols play a common role, so it is called TCP/IP protocol family.

The essence of a port is an in-memory data structure that can be monitored to receive messages as data is written.

So each process to specify the port is too much trouble, can unified any protocol must be what port, so only need protocol + IP can access, port automatically filled.

So there is a special agency to coordinate these, which is called IANA(The Internet Assigned Numbers Authority). Because the network is not centralized, there needs to be an intermediary to coordinate the parties, which is what this organization does, including domain names, ports, protocols, etc.

The port is a 16-bit binary number, two bytes, so the range is an integer from 0 to 65535, which IANA divides into three segments:

  • A protocol is bound to a fixed port, such as HTTP port 80 and HTTPS port 443.

  • 1024 to 49151 are registered ports from which we choose to bind the process.

  • Port 49152 to 65535 are dynamically allocated ports for processes that need to allocate ports.

With a fixed protocol port, we can locate a process on a network using only protocol + IP. Of course, sometimes you still need to specify protocol + IP + port.

socket

With ports, we can locate processes in the network and then communicate data. However, different protocols have different data structures, that is, different operations need to be performed. It is complicated to directly operate the data transmitted from the network, which should be encapsulated by the operating system. So POSIX defines a standard API for sockets, through which we can easily manipulate data of different protocols. (For POSIX, see this article: Source of API design for Node.js: POSIX)

Socket API is divided into two aspects: server and client:

Server: Bind, listen, Accept, read, write, close

Client: connet, write, read, close

POSIX’s idea is that everything is a file, so the NETWORK communication socket API is also designed in the form of read and write.

The server uses Listen to bind the process to a port. The client connects to a port on the server, transmits data to the port through the network, and then reads and writes data.

Various languages encapsulate the Socket API, and Node.js is no exception.

The socket of the Node. Js

Node.js files are read and written through streams, and POSIX handles network operation sockets as file reads and writes, so Node.js sockets are stream apis.

Server socket API:

const net = require('net');

const server = net.Server((socket) = > {
  console.log('client connected');

  socket.on('data'.(data) = > {
    console.log(data.toString('UTF-8'))
  })
  socket.on('end'.() = > {
    console.log('client disconnected');
  });

  socket.write('hello\r\n');
});

server.on('error'.(err) = > {
  throw err;
});

server.listen(8124.() = > {
  console.log('server bound');
});
Copy the code

Node.js is a stream, so it listens for data events. (For more on stream, see my article: Mastering Node.js’s Four Streams once and for all, and solving the “back pressure” problem with burst buffers)

Client socket API:

const net = require('net');

const socket = net.Socket({ host: 'xxxx'.port: 8124 }, () = > {
  console.log('connected to server! ');
  client.write('world! \r\n');
});

socket.on('data'.(data) = > {
  console.log(data.toString());
  client.end();
});

socket.on('end'.() = > {
  console.log('disconnected from server');
});
Copy the code

The direct new approach is more cumbersome, so Node.js takes the factory method one step further:

The new Server can use net.createserver

The new Socket can be net.createconnection

This simplifies things even further.

conclusion

Two processes on the network communicate via IP + port, specifying the format of data through protocol. Ports are an IOC idea, in the form of not binding directly to the process ID, but writing data to the port to which the process binds.

The port number is a 16-bit number that ranges from 0 to 65535, and IANA classifies it into three categories:

Ports 0 to 1024 are protocol ports, ports 1024 to 49151 are registered ports for processes, and ports 49152 to 65535 are dynamically allocated ports.

Processes on the network can be located by using the three elements of protocol, IP, and port. Data formats of specific protocols are different. POSIX provides a series of APIS for sockets, including bind, read, write, and close on the server. Read, write, and close on the client provide apis for reading and writing files.

Various languages encapsulate the apis of these operating systems, as does Node.js. Node.js uses stream to read and write files, so Net.socket and Net.server are stream apis. To simplify creation, factory methods for Net.createconnect and Net.Createserver are provided, respectively.

Hopefully this article has helped you understand the nature of ports (the in-memory data structures used to receive network data), the nature of sockets (POSIX-defined network communication apis), and the API of Node.js NET.