Java concurrency models all at once

In this article we explore the concurrent design model.

Concurrent systems can be implemented using different concurrency models, which describe how threads in the system collaborate to accomplish concurrent tasks. Different concurrency models split tasks in different ways, and threads can communicate and collaborate in different ways.

The concurrency model is very similar to distributed systems

The concurrent model is actually very similar to the distributed system model, in which threads communicate with each other, and in the distributed system model, processes communicate with each other. In essence, however, processes and threads are also very similar. This is why the concurrent model is very similar to the distributed model.

Distributed systems typically face more challenges and problems than concurrent systems such as process communication, possible network exceptions, or remote machine crashes. However, a concurrent model also faces problems such as CPU failures, network card problems, and hard disk problems.

Because the concurrent model is similar to the distributed model, they can be borrowed from each other. For example, the model for thread allocation is similar to the load balancing model in the distributed system environment.

To put it bluntly, the idea of distributed model is derived from the concurrent model.

Recognize the two states

An important aspect of the concurrency model is whether threads should share state, have shared state, or be independent. Shared state means that some state is shared between different threads

States are simply data, such as one or more objects. When threads want to share data, problems such as race conditions or deadlocks can occur. Of course, these problems are only possible, and how they are implemented depends on whether you can safely use and access shared objects.

Independent state means that state is not shared between multiple threads, and if threads need to communicate, they can access immutable objects to do so, which is one of the most efficient ways to avoid concurrency problems, as shown in the figure below

Using independent state makes our design much simpler because only one thread can access objects, and even if they are exchanged, they are immutable.

Concurrency model

Parallel Worker

The first concurrency model is the parallel worker model, where clients delegate tasks to agents, who then assign work to different workers. As shown in the figure below

The core idea of parallel worker is that it mainly has two processes, i.e. agent and worker. Delegator is responsible for receiving the task from the client and delivering the task to the specific worker for processing. The worker returns the result to Delegator after processing. After the Delegator receives the result of the Worker’s processing, it is summarized and handed to the client.

The parallel Worker model is a very common one in Java concurrency models. Many concurrent tools under the java.util.Concurrent package use this model.

Advantages of parallel workers

A very obvious feature of the parallel Worker model is that it is easy to understand. In order to improve the parallelism of the system, you can add multiple workers to complete tasks.

Another advantage of the parallel Worker model is that it can split a task into multiple small tasks and execute them concurrently. The Delegator will return to the Client after receiving the Worker’s processing result. The entire Worker -> Delegator -> Client process is asynchronous.

Disadvantages of parallel workers

Similarly, parallel Worker mode also has some hidden disadvantages

The shared state gets complicated

The actual parallel Worker is more complex than what we draw in the figure, mainly because parallel workers usually access some shared data in memory or shared database.

These shared states may use work queues to hold business data, data caches, connection pools for databases, and so on. In thread communication, threads need to make sure that the shared state can be shared by other threads, rather than just sitting in the CPU cache and making it available to them. Of course, programmers need to consider these issues at design time. Threads need to avoid race conditions, deadlocks, and many other concurrency problems caused by shared states.

Concurrency is lost when multiple threads access shared data, because the operating system must ensure that only one thread can access the data, leading to contention and preemption of shared data. Threads that do not preempt resources will block.

Modern non-blocking concurrent algorithms can reduce contention and improve performance, but non-blocking algorithms are difficult to implement.

Persistent data structures are another option. A persistent data structure always retains the previous version after modification. Therefore, if multiple threads modify a persistent data structure at the same time, and one thread modifies it, the modified thread gets a reference to the new data structure.

Although persistable data structure is a new solution, but this method adopted has some problems, for example, a persistent list will be the beginning of the new elements are added to the list, and returns a reference to add new elements by, but other threads are still only holds the list of previous references the first element, they can’t see the newly added elements.

Persistent data structures such as linkedLists do not perform well on hardware. Each element in the list is an object that is scattered throughout computer memory. Modern cpus tend to have much faster sequential access, so using sequential data structures such as arrays can achieve higher performance. The CPU cache can load a large matrix block into the cache and give the CPU direct access to the data in the CPU cache after loading. With linked lists, it is virtually impossible to scatter elements across the entire RAM.

Stateless worker

The shared state can be modified by other threads, so the worker must re-read it every time it manipulates the shared state to make sure it works correctly on the replica. Workers that do not hold state inside threads become stateless workers.

The order of assignments is uncertain

Another disadvantage of the parallel work model is that the order of jobs is uncertain and there is no guarantee of which jobs will be executed first or last. Task A is assigned to the worker before task B, but task B may be executed before task A.

Assembly line

The second type of concurrency model is the pipeline concurrency model we often encounter in the production shop. The following is a flow chart of the pipeline design model

This organizational structure is just like workers on the assembly line in a factory. Each worker only completes part of all the work, and after completing part of the work, the worker will forward the work to the next worker.

Each program runs in its own thread and does not share state with each other, which is also known as the shared-nothing concurrency model.

The pipelined concurrency model is typically designed for non-blocking I/O, that is, when no work is assigned to the worker, the worker does other work. Non-blocking I/O means that when the worker starts an I/O operation, such as reading a file from the network, the worker does not wait for the I/O call to complete. Because I/O operations are slow, waiting for I/ OS can be time consuming. While waiting for I/O, the CPU can do other things, and the result of the I/O operation will be passed to the next worker. The following is a flowchart for non-blocking I/O

In practice, tasks usually do not flow along an assembly line, and since most programs need to do many things, they need to move between different workers depending on the work done, as shown in the figure below

The task may also require the participation of multiple workers

Reactive – event-driven systems

Systems that use a pipeline model are sometimes referred to as responsive or event-driven systems, which respond to external events, such as an HTTP request or a file being loaded into memory.

The Actor model

In the Actor model, every Actor is actually a Worker, and every Actor can handle tasks.

Simply put, the Actor model is a concurrency model that defines a set of general rules for how system components should behave and interact. The best known programming language to use these rules is Erlang. An Actor Actor responds to the received message and can then create more actors or send more messages while preparing to receive the next message.

Channels model

In the Channel model, workers usually do not communicate directly. In contrast, they usually send events to different channels, and then other workers can get messages on these channels. The following is the Channel model diagram

Sometimes workers do not need to know who the next worker is, they just need to write the author into the Channel. Workers listening to the Channel can subscribe or unsubscribe, which reduces the coupling between workers and workers.

Advantages of assembly line design

Compared with the parallel design model, the pipeline model has some advantages as follows

There is no shared state

Because the pipeline design can ensure that the worker will be transferred to the next worker after the processing is completed, there is no need to share any state between workers, so there is no need to consider the concurrency problem caused by concurrency. You can even implement each worker as a single thread.

A state worker

Because workers know that no other threads modify their data, workers in pipeline design are stateful. Stateful means that they can keep the data they need to operate in memory. Stateful is usually faster than stateless.

Better hardware integration

Because you can think of pipelines as single-threaded, and the advantage of single-threaded work is that it can work the same way hardware does. Because stateful workers typically cache data in the CPU, they can access cached data faster.

To make the task more effective

Tasks in the pipelined concurrency model can be sorted, typically for log writing and recovery.

Disadvantages of assembly line design

The disadvantage of the pipelined concurrency model is that tasks can involve multiple workers and therefore may be scattered across multiple classes of project code. Therefore, it is difficult to determine which task each worker is performing. Pipelined code is also difficult to write, and code that designs many nested callback handlers is often referred to as callback hell. Callback hell is hard to track debug.

Functional parallelism

Functional parallelism model is a kind of concurrency model which is put forward recently. Its basic idea is to use function call to implement it. The passing of a message is equivalent to a function call. Arguments passed to the function are copied, so no entity outside the function can manipulate the data inside the function. This causes functions to perform atomic operations. Each function call can be executed independently of any other function call.

When each function call is executed independently, each function can be executed on a separate CPU. In other words, functional parallelism is equivalent to each CPU performing its own task independently.

The ForkAndJoinPool class in JDK 1.7 implements functional parallelism. Java 8 introduced the concept of stream, which can also iterate over a large number of collections using parallel streams.

The difficulty with functional parallelism is knowing the flow of function calls and which cpus execute which functions, and the additional overhead of calling functions across cpus.

Hello, MY name is Cxuan. I have handwritten four PDFS, including Java basic summary, HTTP core summary, computer basic knowledge and operating system core summary. I have sorted them into PDFS.