preface

I don’t know if you’ve ever been asked how processes communicate, but I’ve been asked many times since I was a fresh graduate, and I knew all about shared memory and pipes, so embrace the fear of process communication hahaha.

An overview of the

What is process communication?

Because each process is relatively independent, and each user request may lead to multiple processes running in the operating system, if multiple processes need to cooperate to complete tasks, they may need to communicate with each other to obtain data. This type of communication is called inter-process communication IPC.

Wikipedia definition

How does Wikipedia define IPC

Inter process communication (IPC) is used for exchanging data between multiple threads in one or more processes or programs. The Processes may be running on single or multiple computers connected by a network. The full form of IPC is Inter-process communication.

By definition, there are two main types of processes: local processes and remote processes. If it’s a local process, we can exchange data directly. If it’s a remote process, we can set up a network connection and exchange data.

Why do processes need to communicate

Why write? As long as a process is involved in collaboration, the two processes must communicate, because without communication, it is almost impossible to complete a task.

I don’t know if you brush B station, if you often brush, you may see these two memes

There are likely to be comments like “When will the guy move out so I can move in?” and most videos will have “next time” bullets

These two examples are not typical, but it helps to understand two typical scenarios of process collaboration, one being “mutually exclusive” and the other being “synchronous”.

First “exclusive”, is a process/thread to monopolize resources, ha, ha, ha, I don’t know the first example, you can understand if you don’t understand, then to think of men’s main resources for house (or girlfriend ha ha ha ha), only the men moved out, another process can be moved in (or be the subject of the girlfriend ha ha ha ha). This is called mutual exclusion.

The second example is “next time sure”, where I schedule my next likes and coin-slots. This is called “synchronization”, which is to schedule the order in which the next processes/threads will execute.

Process Communication (IPC)

There are many kinds of IPC methods that can be used in Unix-like systems. From the perspective of processing mechanism, they can be divided into three categories: communication-based IPC methods, signal-based IPC methods, and synchronization based IPC methods. Among them, communication-based IPC methods are divided into IPC methods based on data transmission and IPC methods based on shared memory. The former includes pipes, named pipes, message queues, sockets and RPC. Shared memory is mainly represented by shared memory (which is the fastest IPC method). Signal-based IPC is often referred to as the signal mechanism of the operating system, and it is the only asynchronous IPC method. In the IPC method based on synchronization, the most important is the semaphore.

PIPE

A pipe is a half-duplex (one-way) communication mode that can only be used for communication between a parent process and its children and between children of the same ancestor. A common example is to use grep to filter the data we want to see.

The essence of the pipeline is a kernel buffer, process from the buffer to access data in the form of first in first out, at the end of the pipeline process in order to process the data written to the buffer, the other end of the process, in order to read data, the buffer can be viewed as a circular queue, the location of the read and write are automatically increase, a data can only be read once, After reading, the data will not exist in the buffer. A pipe can also be thought of as a file, but it is not a file and does not belong to any file system. The pipe itself constitutes a file system and exists in memory.

Although the pipeline is simple, with one end writing data and the other reading data, it has certain limitations:

  • Half duplex (unidirectional) : data flows in only one direction
  • Pipes can only be used by two processes that have a common parent
  • When two parties need to communicate, two pipes need to be established

Because the pipe is half-duplex, that is, process A can transmit data to process B, and process B needs to create another pipe to transmit data to process A. By default, all pipes are unnamed, which means that the pipe is removed (removed from memory) when no process is using it.

Named Pipes (FIFO)

Named pipes break this limitation when only two related processes can be used in a default pipe. FIFO is different from the pipeline in that it provides a path associated with it, in the form of FIFO files exist in the system, as long as you can access the path permission, then can communicate with each other through FIFO, so, through FIFO unrelated processes can also exchange data.

If no process uses the named pipe, it is removed from memory/buffer, but it is not permanently removed, because named pipes exist in the form of persistency in the disk system, which is different from pipes.

The named pipe is the same as the pipe and is also a half-duplex communication mode, that is, two pipes are established to transmit data to each other.

The message queue

Message queue is used to transfer data between the two processes of a simple and effective way, each block has a specific type, the receiver can be selectively receiving data according to the type, somewhat similar to the mailbox, when I receive the data is stored, when a process requires a type of data message queue can be obtained.

Advantages of message queues

  • Synchronization and blocking problems with named pipes can be avoided by sending messages
  • There are ways to view emergency messages ahead of time

Disadvantages of message queues

  • Like pipes, each chunk of data has a limit on its maximum length
  • There is also an upper limit to the total length of blocks contained in all queues in the system

SOCKET

A socket is also an IPC method, but unlike other IPC mechanisms, a socket communication mechanism does not require two processes to be on the same computer system. It allows multiple processes to establish communication and pass data to each other in the form of a network connection.

The socket is set to a large number of TCP/IP protocol stack knowledge, and this article only introduces the process communication, if you are interested in the socket, I can write a separate article about the socket.

The Shared memory

Shared memory is the most efficient mechanism of IPC, because it does not involve the process of any data transmission between, because it share a physical memory location, multiple processes can be in the position to read and write operations, the process by mapping a local process to share physical memory location (through a pointer or other method). The problem with this high efficiency is that we have to synchronize processes’ access to shared memory with other AIDS or race conditions will result, so shared memory is often used in conjunction with other process communication methods.

Race conditions: Two processes competing for the same resource are said to have a race condition if they are sensitive to the order in which the resources are accessed

Although shared memory is the most efficient IPC mechanism, it has a limitation: the processes using shared memory must be in the same computer system and have physical memory to share.

The difference between shared memory and message queue, pipe communication:

Message queue, pipeline data transmission mode is generally:

  • Access to the input
  • Through message queues, data is piped and usually copied from the process to the kernel
  • Copy from kernel to process
  • The process is copying to the output file

Shared memory data is generally transmitted as follows:

  • Input data from a file into shared memory
  • Output from shared memory to a file

Therefore, shared memory does not involve kernel operations and only requires two steps to complete the data transfer, so it is the most efficient IPC mechanism.

signal

The operating system signal is the only asynchronous communication method in IPC. Its essence is to use software to simulate the hardware terminal mechanism, and the signal is used to notify a process that an event has occurred. For example, pressing CMD + C in the terminal will stop the running program, and the common kill command also has signal participation, such as: kill -9 pid, (I don’t know if you know the meaning of -9, will be covered later).

Unlike other IPC methods, signal belongs to the inaccurate communication, signal only tell about what happened, but can’t accurately tell process details information, as well as, interprocess communication is not entirely to IPC for data exchange, the signal is tell each other will only need to make the corresponding event.

In Linux, each signal has a name with a SIG prefix, such as SIGINT, SIGKILL, etc. But inside the operating system, these signals are represented by positive integers, which are called signal numbers. For example, if a process fails to exit properly, Kill -9 PID. 9 is the signal number and the corresponding signal is SIGKILL (kill in MacOS).

A semaphore

When multiple processes access a resource on the system at the same time, such as writing a record in a database or modifying a file at the same time, synchronization needs to be considered to ensure that only one process can have exclusive access to the resource at any one time. Typically, a program’s access to a shared resource is a short piece of code, but it is this piece of code that causes race conditions between processes. This piece of code is called a critical section, or critical section. Process synchronization, which means ensuring that only one process can access a critical code segment at any one time.

Semaphores are completely different from signals. Semaphores are special variables that can only take natural values and support only two operations: wait and signal. On Unix-like systems, both wait and signal have taken on special meanings, so these two semaphore operations are more commonly referred to as P and V operations. The letters come from the Dutch words’ passeren ‘(passing, entering a critical zone) and’ vrijgeven ‘(releasing, exiting a critical zone). Assuming a semaphore SV, the P and V operations on it are as follows:

  • P (SV) : If SV is greater than 0, subtract 1 from it; If SV is 0, the execution of the process is suspended
  • V (SV) : Wakes up if other processes are suspended waiting for SV; If not, the SV + 1 semaphore can be any natural number, but the most common and simplest semaphore is the binary semaphore, which can only take the values 0 and 1. A typical example of using the binary semaphore to synchronize two processes to ensure exclusive access to critical code segments:

In the figure, when critical code is available, the binary semaphore SV has A value of 1, and both processes A and B have access to the critical code segment. If process A performs the P (SV) operation to reduce SV by 1, then process B will be suspended while performing the P (SV) operation until process A leaves the critical code segment and performs the V (SV) operation to increase SV by 1, and the critical code segment becomes available again. If process B is suspended at this point because it is waiting for SV, it will wake up and enter the critical code segment.

Semaphores, sort of like the concept of locks

conclusion

This article mainly introduces why IPC is needed and several ways of IPC. With more theoretical knowledge, I have tried my best to help understand these knowledge through some diagrams. In fact, it is not difficult to understand, ha ha ha.

If we understood the underlying principles of processes (and the general concept of processes), it would be possible to write highly usable concurrent programming code (ah, ideally) in this era of highly concurrent programming. Not to mention concurrency is not a “qualified programmer” era (this is what a “qualified programmer” said to me. Ironically, processes are fundamental to supporting concurrent programming.

So in general, learning the underlying principles of processes will greatly help us understand concurrent programming.

reference

  • Understanding Computer Systems in Depth
  • Linux High Performance Server Programming
  • Go Concurrent Programming Combat
  • Songlee24. Making. IO / 2015/04/21 /…
  • Blog.csdn.net/qq_33951180…
  • Austingwalters.com/introductio…
  • Docs.microsoft.com/en-us/windo…
  • www.guru99.com/inter-proce…
  • Pymotw.com/2/multiproc…
  • www.geeksforgeeks.org/inter-proce…
  • Juejin. Cn/post / 686993…
  • Blog.csdn.net/weixin_4651…