Categories: Concurrent programming date: 2019-04-23 17:11:23

Preface: Problems existing in concurrent programming

1. Context switch problem

Even single-core cpus support multi-threaded execution. This mechanism is implemented by allocating CPU time slices to each thread, giving the impression that multiple threads are executing simultaneously. Each time slice is generally tens of milliseconds by default.

The CPU uses the time slice allocation algorithm to execute tasks in a cycle. After the current task completes a time slice, it switches to another task. Before switching, the CPU saves the execution status of the current task so that it can be loaded into the current state when switching back. This process is a context switch.

Context switching can slow things down. If you find a word you don’t know in a Sanskrit book, you have to memorize the word and the page number and line number, then switch to another Sanskrit dictionary and look it up. This switching process will affect the speed of reading, the same context switching will affect the speed of multithreading.

Is multithreading necessarily faster?

public class ContextSwitchProblem {
    private static final long count = 100000L;
    public static void main(String[] args) throws InterruptedException {
        concurrent();
        serialize();
    }
    /** * execute ** / concurrently
    private static void concurrent(a) throws InterruptedException {
        long start = System.currentTimeMillis();
        Thread thread = new Thread(() -> functionOne());
        thread.start();
        long b = functionOne();
        long time = System.currentTimeMillis() - start;
        thread.join();
        System.out.println("[concurrent] time = " + time + " ms; b = " + b);
    }
    /** ** serial execution */
    private static void serialize(a) {
        long start = System.currentTimeMillis();
        functionOne();
        long b = functionOne();
        long time = System.currentTimeMillis() - start;
        System.out.println("[serialize] time = " + time + " ms; b = " + b);
    }
    private static long functionOne(a) {
        long a = 0;
        for (long i = 0; i < count; i++) {
            a++;
        }
        returna; }}Copy the code

As shown by the above program, the performance of concurrent and serial execution varies as the count data changes. Multithreading does not necessarily improve performance.

cycles Concurrent execution time Serial execution time How much faster is concurrent than serial
100000 201ms 1ms About 200 times slower
10 million 237ms 22ms About 10 times slower
100 million 255ms 83ms About 3 times slower
1 billion 796ms 1003ms A bit faster
10 billion 4857ms 6792ms A bit faster
100 billion 38675ms 66053ms About 1 times faster

When the number of cycles is less than 1 billion, concurrent execution is slow due to county creation and context switching.

Use multithreading wisely

2. Deadlock issues

Deadlock generation

The so-called deadlock refers to waiting for each other to release the lock before performing the next operation. In this way, each other waits in a loop until the system crashes or an error occurs due to timeout.

The classic deadlock scenario is the concurrent scenario: A holds lock 1, B holds lock 2, A waits for B to release lock 2, AND B waits for A to release lock 1, thus waiting for each other to release the locks needed by each other.

How do I avoid deadlocks?

  • Avoid deadlocks at the root
    • Avoid holding multiple locks in a business. Holding only one lock prevents deadlocks.
  • Avoid deadlocks from your code
    • The order of lock possession remains consistent
    • Use a timing lock, which is automatically released after a period of time (this depends on the specific business scenario)

3. Resources are limited by hardware and software

What are resource limits?

Resource constraints refer to concurrent programming in which the performance of the program is limited by computer hardware or software resources. Hardware limitations include network bandwidth limits, CPU processing speed, memory resources, and disk read/write speed. Software restrictions such as socket connection number limit and database connection number limit.

Problems caused by resource constraints

Limited by resources, concurrent programming can be slower due to context switching problems. For example, if the single-thread data processing CPU reaches 90%, if two threads are executed concurrently, then the CPU will be full, increasing the problem of context switching, and will be slower.

How to solve the problem caused by resource constraints?

Target hardware resource limits

For hardware resource constraints, clustering is generally used. If a single machine cannot solve the problem, multiple machines will process it.

Different machines process different data according to certain algorithms. For example, according to “data ID% number of machines” to get a machine code, and according to this machine code to select different machines to process the corresponding data.

Target software resource restrictions

Resource reuse is generally used to limit software resources. For example, a socket connection is multiplexed

First, Java concurrency mechanism underlying implementation principle

volatile

Volatile addresses the visibility of shared variables in multiple threads. For example, if a global variable is volatile, all threads accessing that field are the latest data.

What did Volatile do

When volatile is converted to assembly language, there is a lock instruction that writes data from the processor’s cached rows back to system memory.

In order to improve the processing performance of modern computer systems, the processor does not directly communicate with the memory, but first reads the data from the system memory into the internal cache (L1, L2 or other) and then processes it. Therefore, when the data is processed, it is unknown when it will be stored in the system memory. So there are visibility issues.

Variables that are volatile are written back to system memory immediately after a write. The problem is that even if I write the data back to system memory, some other processor has stored the old data in the cache line before and still has problems processing it.

Therefore, in a multi-processor system, a cache consistency protocol is implemented to solve this problem in order to ensure that the caches in multiple processors are consistent.

Each processor sniffs the data propagating across the bus, checks for any changes to its cached memory data, and sets the cached data to invalid. When the current processor manipulates the data, it reads the cache from the system memory again and operates on it.

What are shared variables?

A shared variable is one that can be accessed by multiple threads. Both static and non-static variables. As long as it’s an accessible member variable.

synchronized

The basis for synchronized: Every object in Java can be used as a lock. There are three main forms of expression:

  • The lock at the level of the normal synchronous method is the current instance
  • For statically synchronized method level locking is the corresponding class
  • The lock on a synchronized code block is an object configured inside the synchronized parenthesis

Atomic operation

Java memory model

1 Multi-processor architecture

1.1 Multi-processor architecture

Multiprocessor architecture. The yellow areas are processes, and T1, T2, and T3 are threads. One CPU can execute one thread.

1.2 Cache Consistency Protocol

For multiprocessor architectures. The CPU’s computing speed is much higher than the memory’s reading and writing speed. To solve this problem, the cache was introduced as a buffer between the CPU and memory. The cache was on the CPU. If it is a multi-core computer, this results in a situation where there is one main memory and multiple caches. This introduces a new problem, cache consistency. Once you operate on the same data in different caches, you will soon have data inconsistencies. So a protocol is defined to solve this problem. Examples include MSI, MESI, MOSI and Dragon Protocol.

1.3 reorder

In order to improve the utilization rate of the operation unit, the CPU will reorder the program instructions according to certain rules, and finally reorganize the results. This improves performance and takes full advantage of the CPU’s computing unit. And it doesn’t make the result wrong.

// One -> other; y = 1; // Other -> one // x = 1; y = 0; / / out-of-order execution: other = 1 (b) - > one (x = b) - > one (a = 1) - > other (y = a) / / x = 1; y = 1; / / 1. Out-of-order execution cause: one (x = b) - > other (y = a) - > one (a = 1) - > other (b = 1) / / 2. One (a=1) -> one(x=b=0) -> The working memory of thread one(CPU local cache) has been changed, but the working memory of thread other has not been synchronized to data, so a is still 0. Other (b=1) -> other(y=a=0) Synchronizes to main memory at this point. // x = 0; y = 0;Copy the code

2 JMM (Java Memory Model)

2.1 Java Memory model

What is the

  1. Main memory
    • All global variables (including instances, static variables, and elements that make up array objects) are held in main memory. Local variables and method parameters are not included because the latter are thread private.
  2. The working memory
    • Thread private. There is a backup of data in main memory required by the thread
  3. thread
    • Threads read and write data in working memory, not variables in main memory.

Why do WE need a Java memory model

Different physical machine hardware architectures use different cache consistency protocols (such as MSI, MESI, MOSI, Dragon Protocol, etc.). To mask these differences, the JVM provides a Java memory model to address this problem.

purpose

The purpose of the Java memory model is to define memory access rules for variables in a program.

2.2 Similarities and differences between the Java Memory model and the JVM runtime data area

The Java memory model and the JVM runtime data area are completely different things.

The JVM runtime data area describes the management model of the Java virtual machine in physical memory, which corresponds to the physical machine. There are stacks for storing data, stacks (local variators in stack frames, return addresses, dynamic links), method areas, program counters (addresses for storing instructions being executed by the current thread), stacks for computing (operands in stack frames), and local method stacks.

Because physical machines with different architectures have different memory models, Java virtual machines have their own memory model, the Java Memory Model (JMM). The appearance of JMM shields the memory access differences of various hardware and operating systems, and realizes platform consistency, making it possible to achieve the purpose of “write once, run everywhere”, while the Java memory model is to define the rules of data read and write.

The safety of thread

There are three main security issues: atomicity, visibility and order.

3.1 Atomicity problem

An error caused by multiple threads operating on the same data

3.2 Visibility issues

Think of it in terms of a multi-CPU processor architecture: data held in a cache local to the CPU is not visible to threads running on other cpus.

Think of it from the perspective of the Java memory model: Data in the working memory is a backup of the main memory, and multiple threads backup and perform operations on the same data, which will lead to data inconsistency in different threads, which is the visibility problem.

Solution to visibility problems: Memory barriers (memory fences). A special set of instructions provided by the CPU or compiler to synchronize working memory data in a shared thread.

3.3 Order problem

Orderliness problems result from several reasons:

  1. The order of compiled instructions in a compiler can be different from the order of source code.
  2. CPU instructions perform reordering optimization.

The JMM has a happens-before rule to address the orderality problem. The happens-before rule specifies that A precedes B in A program, so the compiled instruction must execute A before B without reordering.

3.4 How does Java address these thread safety issues?

Java provides classes under volatile, Synchrolized, final, and java.util.Concurrent packages to address these security issues.

  1. Atomicity problem
    • Synchrolized. The JVM solves atomicity problems with Synchrolized. The UNDERLYING JVM providesmonitorenterandmonitorexitTwo instructions perform atomic operations.
  2. Visibility issues: Resolve visibility issues with volatile
    • volatile
  3. Order problem
    • Orderliness issues resulting from instruction reordering are controlled by the happens-before rule specified by the JMM
    • The ordering problem of concurrent execution by multiple threads is solved synchronously by Synchrolized

volatile

Volatile is a lightweight lock. Used to solve visibility problems and prevent instruction reordering (that is, order problems are solved).

Volatile variables still have a copy of working memory, but because of their particular order of operations, they appear to be read and written directly into main memory.

Volatile specifies that when multiple threads concurrently operate on volatile data, the write must precede the read.

3.5 Optimize barriers and Memory barriers

The foundation of concurrent programming

1 The six states of the thread

  1. NEW (initial)
  2. RUNNABLE (run)
  3. BLOCKED.
  4. WAITING for
  5. TIMED_WAITING (timed waiting)
  6. TERMINATED

You can run the JPS and jstack commands to check the running status of the threads in memory. JPS shows all Java threads. Jstack traces stack information.

[root@localhost ~]# jps 18484 jar 14295 Bootstrap 32168 Bootstrap 30812 Jps 15918 Bootstrap [root@localhost ~]# jstack Full Thread Dump Java HotSpot(TM) 64-bit Server VM (25.191-B12 mixed mode): "Attach Listener" #289 daemon prio=9 os_prio=0 tid=0x00007fb064003000 nid=0x7883 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "lettuce-eventExecutorLoop-1-4" #250 daemon prio=5 os_prio=0 tid=0x00007fb0549ac800 nid=0x167c waiting on condition [0x00007fb005092000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000086a05998> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at io.netty.util.concurrent.SingleThreadEventExecutor.takeTask(SingleThreadEventExecutor.java:238) at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:64) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) "lettuce-epollEventLoop-4-4" #249 daemon prio=5 os_prio=0 tid=0x00007fb070008000 nid=0x167b runnable [0x00007fb006194000] java.lang.Thread.State: RUNNABLE at io.netty.channel.epoll.Native.epollWait0(Native Method) at io.netty.channel.epoll.Native.epollWait(Native.java:114) at io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:241) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:258) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)Copy the code

2 Start and stop the thread

2.1 Enabling a Thread

start();

2.2 Thread Interruption

  1. The thread is terminated by trying to catch InterruptedException
  2. Terminates a Thread with the threadInterrupt identifier
  3. A custom interrupt flag terminates a thread

Locks in Java

And

Atomic manipulation classes in Java

And

Concurrent containers in Java

asd

Concurrency utility classes in Java

asd

Java concurrent library Executor

What is the

Executor is based on the producer-consumer model. The action submitting the task is the producer. The thread executing the task is the consumer.

The thread pool

When a user submits a task, the thread pool determines whether the number of threads reaches corePoolSize. If the number of threads does not reach corePoolSize, a new thread will execute the task. Once the number of threads exceeds corePoolSize, the task will be put into the work queue (blocking queue) and wait for idle threads to execute the task. When the number of threads in the work queue is full, the thread pool creates a new thread to execute the task. Once the number of threads in the pool reaches maxmiumPoolSize, the thread pool refuses to process new tasks.

The source code used a 32-bit int value to save the thread state and the number of threads. The first three digits represent the status, and the last 29 digits represent the number.

Graph LR Task submit --> compare coorPoolSize Compare coorPoolSize-- less than --> create worker thread to execute task compare coorPoolSize-- greater than or equal to --> Store to work queue Store to work queue -- success --> wait for idle thread to execute Save to work queue -- fail -- compare maximumPoolSize compare maximumPoolSize-- reach Max -- reject task execution and commit maximumPoolSize-- fail to reach Max -- create new worker thread to execute taskCopy the code

Several important parameters

  1. WorkQueue: a blocking queue. Saves tasks that need to be executed, providing threading
  2. Worker thread:
  3. task
  4. CorePoolSize: Defines the core size of the threads in the thread pool. Until the work queue is full, only threads less than or equal to the number of core threads are allowed to work.
  5. Maximum number of threads (maximumPoolSize) : Defines the maximum number of threads in the thread pool. Once this number is exceeded, the thread pool rejects new tasks.
  6. KeepAliveTime: indicates the keepAliveTime of idle threads.

“Perform tasks in a thread pool” has more advantages than “assign one thread to each task.” Benefits:

  1. By reusing existing threads rather than creating new ones, you can spread out the huge overhead incurred during thread creation and destruction when processing multiple requests
  2. The worker thread usually already exists when the request arrives, so there is no hesitation to delay the execution of the task by waiting for the thread to be created, improving responsiveness.
  3. By sizing the thread pool appropriately, you can create enough threads to keep the processor busy and prevent your application from running out of memory or failing as multiple threads compete for resources.

The class library provides some useful default configurations. You can create a thread pool by calling one of the Static Factory methods in Executors:

public class Executors {
    // Create a fixed thread pool, creating a thread without submitting a task until the maximum number of threads in the pool is reached.
    public static ExecutorService newFixedThreadPool(a);
    // Single-threaded Executor, which creates a single worker thread to execute tasks. Tasks are executed sequentially in queues.
    public static ExecutorService newSingleThreadExecutor(a);
    // Create a cacheable thread pool. If the current size of the thread pool exceeds the processing requirement, idle threads are reclaimed, and new threads are added as the demand increases. There is no limit to the size of the thread pool
    public static ExecutorService newCachedThreadPool(a);
    // Create a fixed thread pool to execute tasks in a delayed or timed manner.
    public static ScheduledExecutorService newScheduledThreadPool(int nThreads);
    / /... Other methods
}
Copy the code

Executor life cycle

To address the lifecycle of executing services, Executor has extended the ExecutorService interface, adding methods for lifecycle management (as well as convenience methods for task submission).

public interface ExecutorService extends Executor {
    // Used to perform a gentle shutdown process: no new tasks are accepted, while waiting for the execution of submitted tasks to complete -- including tasks that have not yet been started
    void shutdown(a);
    // Performs a rude shutdown: It tries to cancel all running tasks and does not start tasks in the queue that have not yet started.
    List<Runnable> shutdownNow(a);
    boolean isShutdown(a);
    boolean isTerminated(a);
    boolean awaitTermination(long timeout, TimeUnit unit) throws InterruptedException;
    / /... Other convenient methods for task submission
}
Copy the code

Callable and Future interfaces

The difference between Runnable, Callable, and Future

Runnable No result is returned. Callable and Runnable describe abstract computing tasks. A task has four life cycle phases: create, commit, start, and complete.

Future represents the life cycle of a task. The get method gets the execution result of the task (blocking if the previous Callable task is still in progress).

Callable and Future facilitate collaboration between tasks.

Ix. Reference materials

The art of Concurrent programming in Java