1. Introduction of OpenMP
OpenMP is a multi-threaded programming scheme for shared memory parallel systems. The supported programming languages include C, C++, and Fortran. OpenMP provides a high-level abstract description of parallel algorithms, especially suitable for parallel programming on multi-core CPU machines. The compiler automatically processes the program in parallel according to the pragma instruction added in the program. Using OpenMP reduces the difficulty and complexity of parallel programming. When the compiler does not support OpenMP, the program degrades into a normal (serial) program. The existing OpenMP instructions in the program will not affect the normal compilation and operation of the program.
The execution mode of OpenMP is fork-join. At first, there is only one main thread. When parallel computations are needed, several branch threads are derived to perform parallel tasks. When parallel code execution is complete, the branch threads meet and hand over control to a separate main thread.
The schematic diagram of a typical fork-join execution model is as follows:
2. Develop the test environment
For the research of ARM multi-core processor, an 8-core CPU is tested by OpenMP model. Since there is no real development board, I used QEMU to simulate an 8-core Contex-A72 CPU with 512M of memory. How to use QEMU simulation will be explained in the next article.
Preparation:
- Ubuntu18.04 operating system (not a virtual machine)
- Set up a QEMU environment
- Use apt-get to download the cross-compiler AARCH64-Linux-gnu –
3. Write OpenMP test program
#pragma omp [clause [, clause]…] . Common function commands are as follows:
- Parallel: used before a block to indicate that the code will be executed in parallel by multiple threads;
- For: before the for loop statement, it means that the loop calculation task is allocated to multiple threads for parallel execution, so as to realize task sharing. The programmer must ensure that there is no data correlation between each loop;
- Parallel for: the combination of parallel and for instruction, also used before the for loop statement, indicating that the code of the for loop body will be executed by multiple threads in parallel, it has two functions of parallel domain generation and task sharing;
- Sections: used before a code segment that can be executed in parallel, to achieve the task sharing of multiple block statements. The code segments that can be executed in parallel are marked with section instructions (note the distinction between sections and sections);
- Parallel sections: Combination of parallel and sections, similar to PARALLEL for;
- Single: used in the parallel domain to represent a piece of code executed by only a single thread;
- Critical: Used before a critical section of code to ensure that only one OpenMP thread enters at a time;
- Flush: Ensure consistency of data images across OpenMP threads;
- Barrier: Thread synchronization for code in a parallel domain, where threads stop executing at a barrier and wait until all threads have executed at a barrier before proceeding;
- Atomic: Used to specify that a data operation needs to be done atomically;
- Master: Specifies that a piece of code be executed by the main thread;
- Threadprivate: used to specify that one or more variables are threadprivate. The difference between threadprivate and private is explained later.
There are also some API functions that are often used
1. Write the test code openmptest.c
#include<stdio.h> #include"omp.h" void main() { #pragma omp parallel for num_threads(8) for (int i = 0; i < 8; I++) {printf("OpenMP Test, thread id: %d\n", omp_get_thread_num()); } return 0; }Copy the code
2. Cross-compile OpenMPtest.c
aarch-linux-gnu-gcc OpenMPTest.c -o OpenMPTest
Copy the code
3. Mount the OpenMPTest executable file to the Linux file system in QEMU
mount -o loop /home/xt/rootfs.ext3 /home/xt/tmpfs/
cp dhry.elf /home/xt/tmpfs/
umount /home/xt/tmpfs/
Copy the code
4. Run the QEMU startup code
qemu-system-aarch64 \ -machine virt \ -cpu cortex-a72 \ -nographic \ -m 512 \ -smp 8 \ -kernel / home/xt/Linux - 5.12.9 / arch/arm64 / boot/Image \ - initrd/home/xt/rootfs ext3 \ - append "root = / dev/ram0 rdinit = / linuxrc console=ttyAMA0"Copy the code
The execution result is as shown in the figure below:5. Run the mounted executable file,./ XXX, in the simulated ARM environment
Completion !!!!!!!!!!!!!!!!!!
reference
Blog.csdn.net/u011808673/… www.pianshen.com/article/346… Blog.csdn.net/Tronlong/ar…