In general, other platforms are not Turing-complete and need to work with the CPU to complete the processing of work tasks. In this book, we agree as follows:
If a task is executed by the CPU, we call it software execution. If a task (or part of a task) is executed by a coprocessor, GPU, FPGA, or ASIC, we can collectively call it hardware “accelerated” execution; If a task is divided into at least two parts, one part is executed in the software of CPU, the other part is executed in the hardware of coprocessor, GPU, FPGA or ASIC, and the two need to communicate and cooperate with each other, then we can say that the task is completed by hardware and software collaboration. Take the HETEROGENEOUS computing architecture based on CPU+GPU as an example. CUDA is a parallel computing platform and application Programming interface (API) model created by NVIDIA. CUDA allows software developers to use cudA-enabled graphics processing units (Gpus) for general-purpose processing. The CUDA platform is a software layer that provides direct access to the GPU’s virtual instruction set and parallel computing elements to execute the computing kernel.
As shown in Figure 2.6, the simple processing flow of CUDA from the CPU perspective is as follows:
The CPU executes tasks in sequence and saves the data in the CPU memory. Copy the data to be processed from THE CPU memory to the GPU memory (processing ① in the figure); The CPU instructs the GPU to work, and configures and starts the CUDA kernel (step 2 in the figure). Multiple CUDA cores execute in parallel to process the prepared data (process 3 in figure); After the processing is complete, the processing results are copied back to the CPU memory (processing (4) in the figure). The CPU takes the result of the GPU for subsequent processing and continues the work.
Figure 2.6 CUDA processing flow
Note: During GPU operation, the CPU is idle, which can also be used for other work tasks.
As shown in Figure 2.7, we classify hardware/software collaboration into two categories in terms of the relationship between the part of the tasks running in software and the part of the tasks running in hardware:
Parallel hardware/software collaboration, such as CUDA, like thread communication or server/client interaction, is essentially a mutually equal interaction, although there may be a distinction between Master and Slave. Vertical hardware and software collaboration, such as hierarchical network protocol stacks or service invocation between the upper and lower layers of many large-scale hierarchical systems, is the lower layer encapsulation technology implementation details, providing interfaces for the upper layer to invoke, from which the lower layer provides services to the upper layer.
Figure 2.7 Parallel and vertical hardware/software collaboration
Parallel mode and vertical mode are essentially the same. On the one hand, both parties complete their own work; on the other hand, both parties exchange data and information through interaction, and finally realize the cooperation between hardware and software platforms. The difference mainly lies in the logical invocation relationship. The vertical mode mainly completes the functions of this layer based on the services provided by the next layer, and then provides services for the upper layer. In horizontal mode, they call and cooperate with each other.
New technologies emerge and iterate rapidly, posing great challenges to cloud computing hardware architecture. How can this challenge be fundamentally addressed? Look at this book, Fusion Of Hardware and Software, and there are all the solutions.