There are mainly the following kinds of computer communication methods. This article will introduce the following kinds of communication methods and corresponding implementation methods using Python in detail, which can be combined with the learning theory and the Code level for application.

preface

The user address space of each process is independent and generally not mutually accessible, but the kernel space is shared by each process, so communication between processes must be through the kernel.

1. The pipe

If you learned the Linux command, you must be very familiar with the vertical bar “|”.

$ ps auxf | grep mysql

Command line on the vertical bar “|” is a pipe, its function is to the output of a command (ps auxf) before and after as a command input (grep mysql), from the functional description, it can be seen that pipeline is one-way transmission data, if you want to communicate with each other, we need to create two pipes.

At the same time, we learn that the pipe is no name above, so “|” said pipe called anonymous pipe, destroy you’re done.

Another type of pipe is named pipe, also known as FIFO, because data is transferred first in, first out.

To create a named pipe, use the mkfifo command and specify the pipe name:

$ mkfifo myPipe

MyPipe is the name of this pipe. It is based on the Linux concept of “everything is a file”, so the pipe also exists in the form of a file. We can use ls to look at this file.

$ ls -l
prw-r--r--. 1 root    root         0 Jul 17 02:45 myPipe
Copy the code

Next, we write data to the pipe myPipe:

$echo "hello" > $echo "hello" >Copy the code

When you do this, you will notice that the command will stop there after execution. This is because the contents of the pipe are not read, and the command will exit normally only after the pipe has been read.

So, we execute another command to read the data in the pipe:

$cat < myPipe //Copy the code

As you can see, the contents of the pipe are read and printed on the terminal, and the echo command, on the other hand, exits normally.

As we can see, pipes are an inefficient way of communication and are not suitable for frequent exchange of data between processes. The benefit, of course, is simplicity, and it’s easy to know that the data in the pipe has been read by another process.

So how do you create a pipe? What’s the principle behind it?

Anonymous pipes are created with the following system call:

int pipe(int fd[2])
Copy the code

An anonymous pipe is created and two descriptors are returned, one for the read-side of the pipe, fd[0], and the other for the write-side of the pipe, fd[1]. Note that this anonymous pipe is a special file that exists only in memory, not in the file system.

In fact, the so-called pipe, is a string of caches in the kernel. Data written from one part of the pipe is actually cached in the kernel and read from the other end, reading from the kernel. In addition, the piped data is an unformatted stream with limited size.

If you look at this, you might wonder: Both descriptors are in the same process and do not function as interprocess communication. How can a pipe cross two processes?

We can useforkCreate a child process,The created child copies the parent’s file descriptor, so there are two of each processFd [0] and FD [1], the two processes can communicate across processes by writing and reading the same pipe file with their respective FD.

The pipe can only write at one end and read at the other, so this pattern is prone to confusion because both parent and child processes can write and read at the same time. Well, to avoid this, the usual approach is:

  • The parent process closes the read fd[0] and keeps only the written FD [1];
  • The child closes the written fd[1] and keeps only the read FD [0];

So if two-way communication is required, two pipes should be created.

So far, we’ve only resolved the use of pipes to communicate between parent and child processes, but that’s not the case in our shell.

A | B command in A shell, A and B process are shell created the child process, there is no father and son relationship between A and B, is it both the parent process of the shell.

So, in the shell through “|” anonymous pipe connect multiple commands together, in fact, that is, to create a more child processes, so when we write a shell script, can use a pipe fix things, don’t use a pipe, so that we can reduce the child process creation overhead.

We can see that for the anonymous pipe, its communication scope is the process with parent-child relationship. Since the pipe has no entity (i.e., no pipe file), the parent fd file descriptor can only be fork copied for communication purposes.

In addition, named pipes allow unrelated processes to communicate with each other. Because of the command pipe, a device file of type pipe is created ahead of time. Processes can communicate with each other using this device file.

Whether anonymous or named, data written by a process is cached in the kernel, and data read by another process is naturally obtained from the kernel. At the same time, communication data follows the first-in-first-out principle, and file location operations such as LSEEK are not supported.

1.1 a Python implementation

1.1.1 Create a new process using fork in OS module

Processes generated by the Python runtime

When we run a Python program, the system generates a new Python process. When a new process is created using fork, the new process is a child of the original process, which is the parent process. If an error occurs, an OSError exception is raised:

# -*- coding: utf-8 -*-
import time
import os
try:
    pid = os.fork()
except OSError, e:
    pass
time.sleep(20)
Copy the code

Run the code, view the process, in the terminal output as follows:You can see that the second Python process is a child of the first.

Fork Specifies the program flow after the outgoing process

Fork Creates a child process that copies the parent process’s data, and the program continues in two separate processes, which is what the fork name means. Within the child process, this method returns 0; Within the parent process, this method returns the numbered PID of the child process. PID can be used to distinguish between two processes:

# -* -coding: utF-8 -* -import time import OS # Number = 7 try: pid = os.fork() if pid == 0: print("this is child process") number = number - 1 time.sleep(5) print(number) else: print("this is parent process") except OSError as e: passCopy the code

The number variable is declared before the child process is created, and then the child process is decrement by 1. The value of number is printed out. To distinguish the parent process from the child process, let the child process sleep for 3 seconds, so that the effect is obvious.

Chestnut 2:

First look at the code:

#! /usr/bin/python import time import os def child(wpipe): print('hello from child', os.getpid()) while True: msg = 'how are you\n'.encode() os.write(wpipe, msg) time.sleep(1) def parent(): Rpipe, wpipe = os.pipe() # os.pipe() return 2 file descriptors (r, w) pid = os.fork() if pid == 0: os.close(rpipe) child(wpipe) assert False, 'fork child process error! ' else: os.close(wpipe) print('hello from parent', os.getpid(), pid) fobj = os.fdopen(rpipe, 'r') while True: recv = fobj.readline()[:-1] print recv parent()Copy the code

Output:

('hello from parent', 5108, 5109)
('hello from child', 5109)
how are you
how are you
how are you
Copy the code

A pipe is a one-way channel, somewhat like a shared memory cache. A pipe has two ends, including an input and an output. For a process, it can only see one end of the pipe, either the input or the output.

Reads a string from a file object. Os.fdopen () is used to wrap the underlying file descriptor (pipe) into a file object, which is then read by the readline() method in the file object. Note here that the readline() method of the file object always reads a line with a newline character ‘\n’, and even the newline character. Another improvement is to close the unused end of the parent and child pipeline.

If you want two-way communication with child processes, one PIPE pipe is not enough; you need two PIPE pipes. Os.dup2 () redirects output and input. Spawn is similar to subprocess.popen () in that it sends messages to child processes and gets data back from child processes.

#! /usr/bin/python #coding=utf-8 import os, sys def spawn(prog, *args): stdinFd = sys.stdin.fileno() stdoutFd = sys.stdout.fileno() parentStdin, childStdout = os.pipe() childStdin, parentStdout= os.pipe() pid = os.fork() if pid: os.close(childStdin) os.close(childStdout) os.dup2(parentStdin, ParentStdin os.dup2(parentStdout, stdoutFd)# Output stream bound to the pipe, sent to the child process childStdin else: Os.close (parentStdout) os.close(parentStdout) os.dup2(childStdin, stdinFd)# Bind the input stream to the pipe os.dup2(childStdout, stdoutFd) args = (prog, ) + args os.execvp(prog, args) assert False, 'execvp failed! ' if __name__ == '__main__': Mypid = os.getpid() spawn('python', 'pipetest.py', 'spam') print 'Hello 1 from parent', mypid # Flush () reply = raw_input() sys.stderr.write('Parent got: "%s"\n' % reply)#stderr print 'Hello 2 from parent' Mypid sys.stdout.flush() reply = sys.stdin.readline()# Another way to get information from the child sys.stderr. Write ('Parent got: "%s"\n' % reply[:-1])Copy the code

In addition to the OS implementation, you can also use multiprocessing to implement, see the code:

From multiprocessing import Process, Pipe """ multiprocessing.Pipe([duplex]) returns two connection objects (conn1, conn2) representing both ends of the Pipe. If duplex=False,conn1 can only be used to receive messages, and conn2 can only be used to send messages. Different from os.open in that os.pipe() returns two file descriptors (r, w) for readable and writable "" def send(pipe): pipe.send(['spam'] + [42, 'egg']) pipe.close() def talk(pipe): pipe.send(dict(name='Bob', spam=42)) reply = pipe.recv() print('talker got:', reply) if __name__ == '__main__': (con1, con2) = Pipe() sender = Process(target=send, name='send', args=(con1,)) sender.start() print("con2 got: Con2.close () (parentEnd, childEnd) = Pipe() child = Process(target=talk, name='talk', args=(childEnd,)) child.start() print('parent got:', parentEnd.recv()) parentEnd.send({x * 2 for x in 'spam'}) child.join() print('parent exit')Copy the code

2. Message queues

As mentioned earlier, the communication mode of pipes is inefficient, so pipes are not suitable for frequent exchange of data between processes.

This problem can be solved by the message queue communication mode. For example, if process A wants to send A message to process B, process A puts the data in the corresponding message queue and then returns the data. Process B can read the data when it needs it. The same is true when process B wants to send A message to process A.

Again, the message queue is stored in the kernel of messages in the list, when sending data, will be divided into an independent unit of data, that is the message body (block), the body is a user-defined data types, message sender and receiver to agree the good news the data type of the body, so every body is a fixed size of storage block, Unlike pipes, which are unformatted byte stream data. If the process reads the message body from the message queue, the kernel deletes the message body.

The life cycle of the message queue is with the kernel. If the message queue is not released or the operating system is not shut down, the message queue will always exist. The life cycle of the anonymous pipe mentioned earlier is established with the creation of the process and destroyed with the end of the process.

Messaging is a model in which two processes communicate as if they were sending emails, you send one, I send one back, and you can communicate frequently.

However, there are two deficiencies in the communication mode of mail, one is the communication is not timely, the other is the size limit of attachments, which is also the deficiency of message queue communication.

Message queues are not suitable for the transfer of large data because there is a maximum length limit for each message body in the kernel, as well as an upper limit on the total length of all message bodies contained in all queues. In the Linux kernel, there are two macros that define MSGMAX and MSGMNB, which define the maximum length of a message and the maximum length of a queue in bytes, respectively.

Message queue communication in the process, exists between user mode and kernel mode data copy overhead, because the process of writing data into the kernel of the message queue, copies the data from the user mode to kernel mode will happen, in the same way the other process reads the message data in the kernel, copy the data from the kernel state will happen to the user mode process.

3. Shared memory

In the process of message queue reading and writing, there will be message copy between user mode and kernel mode. That shared memory way, very good to solve this problem.

Modern operating systems use virtual memory technology for memory management, that is, each process has its own independent virtual memory space, and the virtual memory of different processes is mapped to different physical memory. Therefore, even if the virtual addresses of processes A and B are the same, they actually access different physical memory addresses and do not affect data addition, deletion, and modification.

The mechanism of shared memory is to take a virtual address space and map it to the same physical memory. In this way, what one process writes can be immediately seen by another process without having to copy it back and forth, greatly increasing the speed of inter-process communication.

4. The semaphore

The new problem with shared memory communication is that if multiple processes modify the same shared memory at the same time, it is likely to collide. For example, if two processes write to the same address at the same time, the first process will find that the content has been overwritten by someone else.

To prevent data confusion caused by multiple processes competing for shared resources, a protection mechanism is needed so that shared resources can only be accessed by one process at any time. As it happens, the semaphore implements this protection mechanism.

A semaphore is actually an integer counter that is used for mutual exclusion and synchronization between processes rather than for caching data communicated between processes.

Semaphores represent the number of resources and are controlled by two atomic operations:

  • One is the P operation, which subtracts the semaphore by -1. If the semaphore is less than 0, the resource is occupied and the process needs to block and wait. If the semaphore >= 0 after subtraction indicates that resources are available and the process can continue normally.
  • The other operation is the ** V operation **, which adds 1 to the semaphore. If the semaphore <= 0, it indicates that the process is currently blocked and will wake up the process. If the semaphore > 0 after the sum, it indicates that no process is currently blocked;

The P operation is used before entering the shared resource, and the V operation is used after leaving the shared resource. These two operations must be paired.

Next, for example, if we want two processes to have mutually exclusive access to shared memory, we can initialize the semaphore to 1.

The specific process is as follows:

  • Process A performs operation P before accessing the shared memory. The initial semaphore value is 1. After process A performs operation P, the semaphore value changes to 0, indicating that shared resources are available and process A can access the shared memory.
  • If, at this point, process B also wants to access the shared memory, and performs P, the signal quantity changes to -1, which means that the critical resource is occupied, so process B is blocked.
  • Until process A finishes accessing the shared memory, operation V will be performed to restore the semaphore to 0, and then the blocking thread B will be awakened so that process B can access the shared memory. Finally, after the shared memory is accessed, operation V will be performed to restore the semaphore to its initial value 1.

As you can see, when the signal is initialized to 1, it represents a mutex, which ensures that only one process is accessing the shared memory at any time. This protects the shared memory.

In addition, in multiple processes, each process does not necessarily run sequentially, but moves forward at an independent and unpredictable pace, but sometimes we expect multiple processes to work closely together to achieve a common task.

For example, process A is responsible for producing data, while process B is responsible for reading data. The two processes cooperate and depend on each other. Process A must produce data before process B can read data.

In this case, we can use the semaphore to implement multi-process synchronization. We can initialize the semaphore to 0.

Specific process:

  • If process B executes before process A, the initial semaphore value is 0 when P is executed, so the semaphore will change to -1, indicating that process A has not produced data, so process B will block and wait.
  • Then, when process A finishes producing data and performs operation V, the signal quantity changes to 0 and process B, which is blocked at operation P, wakes up.
  • Finally, when process B wakes up, it means that process A has produced data and process B can read it normally.

As you can see, when the signal is initialized to 0, it represents A synchronization semaphore, which ensures that process A should execute before process B.

5. The signal

The above mentioned inter-process communication is the normal working mode. For abnormal working mode, the process needs to be notified by means of “signals”.

Although the names of signals and semaphores are 66.66% similar, they have completely different uses, just like Java and JavaScript.

In the Linux operating system, dozens of signals are provided in response to a wide variety of events, each representing different meanings. We can run the kill -l command to check all the signals:

$ kill -l 1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP 6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1 11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM 16) SIGSTKFLT 17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP 21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ 26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO 30) SIGPWR 31) SIGSYS 34) SIGRTMIN 35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3 38) SIGRTMIN+4 39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8 43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13 48) SIGRTMIN+14  49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12 53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7 58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2 63) SIGRTMAX-1 64) SIGRTMAXCopy the code

A process running on a shell terminal can send signals to the process when certain key combinations are entered through the keyboard. For example,

  • Ctrl+C generates SIGINT signal to terminate the process;
  • Ctrl+Z generates SIGTSTP signal, indicating that the process is stopped but not finished.

If a process is running in the background, you can run the kill command to send signals to the process. The PID of the running process must be known, for example:

  • Kill -9 1050: sends the SIGKILL signal to the process whose PID is 1050 to end the process immediately.

Therefore, the main sources of signal events are hardware sources (such as keyboard Cltr+C) and software sources (such as the kill command).

Signals are the only asynchronous communication mechanism in the interprocess communication mechanism, because signals can be sent to a process at any time, and once a signal is generated, we have the following ways of processing the signal by the user process.

1. Perform default operations. Linux has a default action for each signal, such as the SIGTERM signal in the list above, which means to terminate a process. Core Dump is used to save the running status of the current process in a file after the termination of the process, so that programmers can analyze the problem later.

2. Pick up signals. We can define a signal processing function for the signal. When the signal occurs, we execute the corresponding signal processing function.

3. Ignore signals. When we do not want to process a signal, we can ignore the signal and do nothing. There are two signals that application processes cannot catch and ignore, namely SIGKILL and SEGSTOP, which are used to interrupt or terminate a process at any time.

6.Socket

The pipes, message queues, shared memory, semaphores, and signals mentioned above all communicate between processes on the same host, so to communicate with processes on different hosts across the network, Socket communication is required.

In fact, Socket communication can not only communicate with different host processes across the network, but also with the host process communication.

Let’s look at the system call that creates the socket:

int socket(int domain, int type, int protocal)

The three parameters represent:

  • Domain parameters are used to specify protocol families, such as AF_INET for IPV4, AF_INET6 for IPV6, and AF_LOCAL/AF_UNIX for local computers.
  • The type parameter is used to specify communication features, such as SOCK_STREAM for byte stream, TCP and SOCK_DGRAM for datagram, UDP and SOCK_RAW for raw socket.
  • The Protocal parameter was originally used to specify communication protocols, but is now largely obsolete. Because the protocol is specified by the first two parameters, protocol is usually written as 0.

The mode of communication varies depending on the socket type created:

  • TCP byte stream communication: socket type is AF_INET and SOCK_STREAM;
  • UDP datagram communication: socket type is AF_INET and SOCK_DGRAM;
  • Local process communication: The local byte stream socket type is AF_LOCAL and SOCK_STREAM, and the local datagram socket type is AF_LOCAL and SOCK_DGRAM. In addition, AF_UNIX and AF_LOCAL are equivalent, so AF_UNIX also belongs to the local socket.

Next, a brief description of the three communication programming modes.

Socket programming model for TCP protocol communication

  • The server and client initialize the socket and get the file descriptor.
  • The server calls bind, which binds to the IP address and port.
  • The server calls LISTEN to listen.
  • The server calls ACCEPT and waits for the client to connect.
  • The client invokes CONNECT to initiate a connection request to the address and port of the server.
  • The server accept returns the file descriptor for the socket used for transmission.
  • The client calls write to write data; The server calls read to read data;
  • When the client disconnects, close is called. When the server reads data, EOF is read. When the server finishes processing data, close is called to indicate that the connection is closed.

If the server calls Accept, the connection will return a completed socket, which will be used to transfer data.

Therefore, the listening socket and the actual socket used to transmit data are “two” sockets, one called the listening socket and the other called the completed connection socket.

After a successful connection is established, both parties begin to read and write data using the read and write functions, just like writing to a file stream.

Socket programming model for UDP protocol communication

UDP is connectionless, so there is no need for a three-way handshake, and thus no need for the same calls to LISTEN and connect as TCP, but UDP interactions still require IP addresses and port numbers, and therefore bind.

For UDP, there is no need to maintain connections, so there is no such thing as sender and receiver, and there is no concept of client and server. As long as there is a socket, multiple machines can communicate with each other, so every UDP socket needs to be bind.

In addition, each time sendto and recvfrom are called, the IP address and port of the target host are passed in.

Socket programming model for local interprocess communication

Local sockets are used for interprocess communication on the same host:

The local socket programming interface is consistent with IPv4 and IPv6 socket programming interface, and can support byte stream and datagram protocols.

  • Local socket implementation efficiency is much higher than IPv4 and IPv6 byte stream, datagram socket implementation.
  • For local byte stream sockets, the socket types are AF_LOCAL and SOCK_STREAM.

For local datagrams socket, the socket types are AF_LOCAL and SOCK_DGRAM.

Local byte stream sockets and local datagrams sockets bind to a local file, unlike TCP and UDP, which bind to an IP address and port. This is the biggest difference between them.

conclusion

Since the user space of each process is independent and cannot be accessed by each other, the kernel space is used for inter-process communication for the simple reason that each process shares a kernel space.

The Linux kernel provides a number of ways to communicate between processes, the simplest of which is pipes, divided into “anonymous pipes” and “named pipes.”

Anonymous pipe as the name suggests, it has no logo, name anonymous pipe is a special file exists only in memory, did not exist in the file system, the vertical bar “|” is anonymous pipe shell command, communication data flow in a plain and limited size, is one-way communication way, the data can only flow in one direction, if you want to two-way communication, Two pipes need to be created, and anonymous pipes can only be used for parent-child communication between processes. The life cycle of anonymous pipes is established with the creation of the process and disappears with the termination of the process.

Named pipes break the restriction that anonymous pipes can only communicate between related processes, because the prerequisite for using named pipes is to create a device file of type P on the file system, through which unrelated processes can communicate. In addition, whether anonymous or named pipes, the data written by a process is cached in the kernel, and the data read by another process is naturally obtained from the kernel. At the same time, the communication data follows the first-in-first-out principle, and file location operations such as LSEEK are not supported.

Message queue to overcome the communication pipeline data is plain byte stream, the message queue is actually stored in the kernel “message list”, the body of the message queue can be user-defined data types, send data, will be divided into a, an independent body when receiving data, of course, also want to and the sender sends the message body is consistent with the type of data, This ensures that the data read is correct. The speed of message queue communication is not the most timely, after all, every data write and read needs to go through the process of copying between user and kernel.

Shared memory can be solved in the message queue communication between user mode and kernel mode data copy process of overhead, it directly assigned a Shared space, each process can have direct access to, like your own space access process convenient, don’t need in kernel mode or system calls, greatly improving the speed of communication, enjoy the fastest in the name of interprocess communication way. However, the convenient and efficient shared memory communication brings new problems. Multiple processes competing for the same shared resource will cause data confusion.

Semaphores, then, are needed to secure the shared resource to ensure that only one process can access it at any one time, which is mutually exclusive access. The semaphore can not only achieve mutual exclusion of access, but also achieve synchronization between processes. The semaphore is actually a counter, which represents the number of resources, and its value can be controlled by two atomic operations, namely P operation and V operation.

A semaphore with a similar name is a signal, which has a similar name but not the same function at all. Signal is inter-process communication mechanism in the asynchronous communication mechanism, the signal can be direct interaction between the application process and the kernel, the kernel can also use signal to notify the user space, what had happened to the process of system events, the source of the signal events mainly include hardware source source (such as keyboard Cltr + C) and software (e.g., kill command), Once a signal occurs, a process can respond to it in three ways. Perform default operations. 2. Capture signals. 3. SIGKILL and SEGSTOP are two signals that the application process cannot detect and ignore, so that we can terminate or stop a process at any time.

The communication mechanism mentioned above, all work in the same host, if you want to communicate with different host processes, then you need Socket communication. Sockets are not only used for communication between different host processes, but also for communication between local host processes. According to different Socket types, sockets can be divided into three common communication modes: TCP, UDP, and local process communication.

This is the main mechanism for interprocess communication. You might ask, what about the way threads communicate with each other?

Threads under the same process are sharing process resources, as long as the shared variables can achieve inter-thread communication, such as global variables, so the focus is not on the way of communication between threads, but on the problem of multi-thread competition for shared resources, semaphore can also achieve mutual exclusion and synchronization between threads:

Mutually exclusive to ensure that only one thread can access a shared resource at any time.

Synchronization ensures that thread A is executed before thread B;


References:

Mp.weixin.qq.com/s/MnIcTR0KK… Blog.csdn.net/qq_38526635… Blog.csdn.net/csujiangyu/…