This is the 13th day of my participation in the August More Text Challenge

We started by asking two questions, do we need to maximize queue size in multi-threaded environments, and do we need to lock in multi-threaded environments? The answer is no;

With those two questions in mind, today we learned about the powerful Disruptor framework, a single-threaded system that supports 6 million orders per second that gained industry attention after its 2010 QCon presentation. In 2011, Enterprise application software expert Martin Fowler wrote a lengthy introduction. In the same year it won the Oracle Duke Award, the most recent update was in August, and it is still very active. LMAX was founded to create a very high performance financial exchange. In its abbreviated form, it says: “Very close to the theoretical limits of modern processors exchanging data between cores;

All right. Let’s first solve the above two problems, maybe everyone knows this problem, but do not know the details of the problem; Let’s start with the concurrency process: Concurrency doesn’t just mean that multiple tasks happen concurrently, it also means that they compete for access to resources, such as files, library table data, memory space/values, etc. If you are interested in the visibility of resource modifications in a race, look at how the ArrayList thread is handled by JUC, and see how the JUC is handled by TRANSIENT and volatile.

For the above problems, we can easy irascible by locking to deal with, but after the lock competition to acquiring by the kernel context switch, and context switching is very time consuming, which is now the cloud server vendors multi-core processor is a selling point, some projects on the number of concurrent threads calculation can also according to your kernel dynamically generated;

Speaking of this, we may immediately think of CAS(Compare and swap). Although CAS is very efficient in solving atomic operations, there are still three major problems in CAS. ABA problem, loop CAS is too expensive and can only guarantee atomic operation of one shared variable

  1. ABA issues are resolved by version number, and the JDK’s atomic package provides a class AtomicStampedReference to address ABA issues
  2. If the JVM can support pause instructions provided by the processor, the efficiency of the JVM can be improved. Pause instructions delay de-pipeline instructions so that the CPU does not consume too many execution resources. The latency depends on the version of the implementation, and on some processors the latency is zero. Second, it improves CPU execution efficiency by avoiding CPU pipeline flush due to memory order violation during loop exit
  3. Atomic operations on only one variable are guaranteed, but since Java1.5 the JDK has provided the AtomicReference class to ensure atomicity between reference objects. You can place multiple variables in an object to perform CAS operations;

Data of any value can be at any stage of the latest version after the written in the register, storage buffer, multi-layer one cache or main memory, if the thread to share this value, you need to in an orderly way to make it can be seen, this is done by the coordination of cache consistency message exchange, these messages generated in time can be controlled through memory barrier;

The read memory barrier executes it by marking a point in the invalid queue to indicate load instructions on the CPU in order for changes to enter its cache, which gives it consistent global access to write operations that are sorted before the read barrier.

Write barriers command a storage instruction on the CPU to execute it through its cache flush by marking a point in the storage buffer, and this barrier provides an ordered view of what storage operations took place prior to the write barrier.

The full memory barrier sorts load and storage, but only on the CPU that executes it. Volatile is the case;

Queues typically use linked lists or arrays as the underlying storage of elements. If the queue in memory is allowed to be unbounded, it can grow unchecked until it runs out of memory and causes the system to crash. This happens when producers outnumber consumers. Unbounded queues are useful in systems where producers can’t outnumber consumers and memory is a valuable resource, but there is always a risk if this assumption is not true and queues grow indefinitely. So when we create thread pools, we usually define thread sizes, queue sizes, and in Java we use ThreadPoolExecutor

public ThreadPoolExecutor(int corePoolSize,
                              int maximumPoolSize,
                              long keepAliveTime,
                              TimeUnit unit,
                              BlockingQueue<Runnable> workQueue)
Copy the code

Queue implementations tend to have write contention on header, tail, and size variables. When used, queues are usually always close to full or close to empty due to the speed difference between consumers and producers. This tendency to always be full or always empty can lead to serious contention, and even if the head and tail mechanisms are separated using different concurrent objects (such as locks or CAS variables), they usually occupy the same cache lines (hardware typically consists of fixed-size bytes for cache lines). This is the level of granularity at which the cache consistency protocol operates, meaning that if two variables are in the same cache line and written by different threads, they will have the same write contention problems as a single variable. This is a concept called false sharing. To achieve high performance, it is important to ensure that independent but simultaneously written variables do not share the same cache row if you want to minimize contention (tip: For trees and linked list structures, CPU memory data prediction is limited due to their wide distribution).

Producers, queue management statement head statement at the end of the queue consumers and storage problems of intermediate node makes the realization of concurrent design is very complicated, cannot be used in the queue is managed in a single large granularity lock (if you use a big granularity lock, although implementation easy, but the performance plummeted, if fine-grained, implementation is very troublesome).

The use of queues is also an important source of garbage in Java

First, objects must be allocated and placed in queues.

Second, if linked lists are supported, objects representing the list nodes must be allocated.

All of these objects assigned to support the queue implementation need to be redeclared when they are no longer referenced.

Disruptor Disruptor Disruptor Disruptor Disruptor Disruptor Disruptor Disruptor Disruptor