Recently, IT occurred to me “Why is everyone using gpus instead of CPUS for CV research?” In view of the fact that the author is not in the direction of CV, but is very curious about this issue, so I searched some information on the Internet and made up this article to make a record. Besides, I also reviewed the basic knowledge and development process of CPU and GPU.

First, cpus and Gpus are different components that perform different tasks, and they are not “exclusive” replacements — you can do without a single GPU on an older computer, but you can never do without a CPU.

This paper is mainly referenced from:

What is the difference between CPU and GPU? – zhihu (zhihu.com)

Huang Da-fen (Ji) Dou (Tang)

28 years in the making, 28 years in the making, and he finally beat Intel by one notch! – zhihu (zhihu.com)

CPU

In the standard Von Neumann architecture, a computer consists of five components: an input device, a memory, an arithmetic unit, a controller, and an output device. The two highlighted in bold are the CPU’s responsibility. Due to differences in physical structure, physical distance between devices, and operating mechanism, CPU processing speed and memory are severely mismatched (why registers are faster than memory? – Ruan Yifeng’s weblog (ruanyifeng.com). To reduce the problem of slow efficiency caused by frequent access to “slow” memory, the CPU does two things:

  1. There are registers, arithmetic units and controllers, which together form the core components of the CPU. Registers can be classified as MAR, MDR, IR (current) instruction register, AND PC (program counter), depending on what is stored.
  2. Allocate a large portion of space as the Cache Cache, and follow the principle: registers can be removed, do not Cache values, can Cache values absolutely do not Cache.

The CPU frequency is actually divided into the main frequency and the external frequency, the external frequency is specifically designed to “accommodate” the slow parts, we usually talk about the CPU frequency refers to the main frequency.

Main frequency = external frequency × multiple frequency. Since the main frequency is much larger than the external frequency, the frequency doubling acts as a “harmonic factor” here. We can use some official software, or enter Basic I/O System, BIOS System (the Bootstrap program used to start the OS Kernel when the computer is started, and also responsible for checking the hardware and adjusting the machine parameters before starting the computer. These Settings can be directly recorded in the COMS chip) to increase the CPU frequency by adjusting the frequency doubling. If the main frequency exceeds the rated coefficient, the CPU will be “overclocked”, which comes at the expense of service life and higher risk of damage.

The CPU’s working process can be roughly divided into three steps: instruction fetching, analysis, and execution. For example, when it comes to calculating “1 + 1” problems, the CPU’s controller, the CU, knows where to go by analyzing instructions (these values should have been stored across the bus into the CPU’s registers beforehand). If CPU wants to read data from memory, it has to go through MMU unit, through a series of processes such as address virtual-real conversion), how to do it, and where the calculated value should be stored, and then the controller CU transfers the specific calculation steps to ALS.

Any program we write in a high-level language has to work its way down to the final translation of machine language, binary, n-byte CPU instructions. The CPU generally executes sequentially (modern cpus reorder certain instructions for optimization purposes, so they don’t actually execute strictly sequentially). To put it simply, if it takes n time for a 1-core CPU to compute a “1 + 1” problem, it will take roughly 1000N time to compute 1000 “1 + 1” problems.

As the head of the household, the CPU is responsible for more household chores than expected and has a complex internal structure. Nowadays, a physical CPU chip can have multiple processing cores. For example, a chip with four processing cores can be called a “quad-core chip” for short. At the same time, due to the development of hyperthreading technology, a core can logically process two or more threads concurrently in a time slice. For example, your current computer has a “4-core, 8-thread” CPU (you can open Explorer on Windows), but you can just as easily call it an “8-core chip.” However, “4-core 8-thread” and “8-core 8-thread” (the latter does not use hyper-threading technology) will be significantly less efficient at near-full CPU utilization than the latter.

Cpus devote most of their energy to controlling (such as instruction jumps) and managing the Cache (for which the CPU has a three-level Cache), while computation is a relatively small part of the effort. In the CPU, the ratio of CU, Cache and ALS is approximately 1:2:1, that is, computing only accounts for 25% of the CPU. This is not to say, of course, that cpus are not powerful. On the contrary, most cpus are capable of performing complex floating-point calculations in very few clock cycles.

The CPU’s advanced computing power and more advanced control capabilities make it handy for processing programs with complex instruction streams. On the other hand, if the CPU is always doing some simple, repetitive computation tasks, or even saying that the tasks are super large, then it is a bit too “talented”. This foreshadows the subsequent development of graphics cards and the birth of GPU.

GPU precursors — graphics cards

Computer graphics was developed by Ivan Edward Sutherland of the Massachusetts Institute of Technology in 1962 and continued to evolve over the next two decades. However, at that time, there were no specialized graphics processing components, nor was the concept of “GPU” proposed, and the output of graphics in the computer was still the CPU.

To a computer, a graph is a matrix, each element of which represents a color number at the corresponding pixel position. Then the image processing is to carry out the same calculation for each pixel in the matrix, and finally merge to get the overall calculation result. Obviously, the processing of each pixel is independent (calculating the current pixel does not affect the results of other pixels), which means that image computing is very suitable for parallel computing. According to this principle, suitable for parallel computing and cryptography, coin mining and other “industries”.

But with the technology of the time, parallel computing was difficult, and there were no graphics acceleration techniques. At that time, the process of computer image processing was almost the same as CRT display, relying on the CPU to “scan” the image sequentially and calculate each pixel and output. Graphics adapter), its task is only to translate the graphics calculated by the CPU into a signal that can be recognized by the display device and projected to the display screen, without its own computing power (this is what the GPU does later).

In 1988, ATI’s ATI VGA Wonder, which supports 256 colors, was the first truly first-generation graphics card. The subsequent development of graphics cards is mainly reflected in the improvement of pixel accuracy, or the expansion of color types and other aspects. After all, VGA is only responsible for the output of images, and the image computing task is still completely responsible by the CPU, especially after Microsoft launched the Windows graphics operating system to support Windows operation in the 1990s, the CPU alone gradually “overwhelmed”.

In 1991, ATI released the Mach 8 Graphics Card, which meant the first second generation Graphics Card to support Microsoft Windows operations. It had a dedicated chip inside (it was a prototype of the GPU, but it was still a nobody at the time) for graphics acceleration, liberating CPU productivity. The birth of the graphics card accelerated the popularization of Windows, but also accelerated the pace of PC graphical interface development.

In the multimedia era, the third generation of graphics cards supporting video acceleration and the fourth generation of graphics cards supporting 3D acceleration are also coming out one after another. Graphics card industry after ten years of bloody war, 80 manufacturers disappeared, the entire industry into the N card (NVIDIA) and A card (AMD) due-out era. In 1999, the concept of “GPU” was formally proposed by Nvidia, and the first GPU — GeForce 256 was launched.

What’s GPU

A GPU, or Graphics Processing Unit in full, is a chip that is installed inside a Graphics card. In the beginning, GPU’s task was to share graphics rendering, graphics acceleration and other tasks for CPU, and these tasks required a lot of parallel computing. Therefore, THE ARCHITECTURE of GPU is also tailored on this basis.

Compared with CPU, nearly 90% of GPU is composed of ALS, while Cache and CU only account for a small part of the rest. Therefore, GPU is very suitable for computing scenarios where the steps before and after calculation are independent of each other. If GPU and CPU single computing unit ALS are compared, GPU cannot be compared with them (here the author makes a rough measurement by frequency, GPU frequency unit is MHz, while CPU frequency is GHz, almost 103 orders of magnitude difference). But gpus win in the “sea of humanity” strategy — a CPU may have only a few ALS, but a GPU may have thousands of ALS, and each small cluster of ALS builds a computational Pipeline in one unit.

There is a very popular analogy on the Internet: the CPU is an old professor and the GPU is leading 1000 pupils. If you have to solve 1000 elementary school math problems, it’s obvious that the GPU is faster. To solve complex mathematical problems, no matter how many elementary school students can do it, you need a professor with strong control and calculation skills. For example, 1000 “1 + 1” problems are assigned to the GPU and CPU respectively. The CPU may take 1000N units of time, but the GPU may only need m units of time to solve. But if you ask each of them to solve the same higher number problem, the situation is completely reversed.

According to one of my friends who studies CV, it takes about six days to complete the same training work with only CPU, but only two days with GPU.

Young people’s next video card, why play games

The graphics cards we are talking about now generally refer to the separate graphics cards with two or three large cooling fans that are added to desktop computers. So that the follow-up more magical reality, is the use of graphics card speculation to drive the price of graphics card skyrocketed, which leads to the CV direction of the research ER and all game enthusiasts complain constantly.

After the story, is that Chinese beefy man, the founder of Nvidia — Huang Renxun home. At first, Nvidia just wanted to make the best 3D graphics chips for gamers around the world. But slowly, Huang found that scientists were also using Gpus to speed up calculations.

In 2012, Ng stunned the industry by leading Google Brain to identify a cat from 10 million images. But spending $1 million on 1,000 computers and 16,000 cpus crushed him. Until one day, Ng took the lead, building a deep learning model using Nvidia graphics processing chips (Gpus) instead of Intel cpus. This time, he did the same thing with just 16 computers and 64 Gpus. Behind this result, which excites Ng, lies a profound computing revolution.

Cpus have dominated computing since Intel pioneered the x86 architecture in 1978, but these logical processors struggled to handle large volumes of data. Therefore, with the emergence of data explosion and artificial intelligence, GPU with its powerful parallel computing capability has the potential to develop later. The AUDIENCE for gpus has also expanded from gaming enthusiasts to research departments (amazing as it sounds). However, one thing has never changed: the CPU is the host while the GPU is the device. In the future, no matter how fast the GPU develops, it will only share the work of the CPU, not replace it.

However, in the early days, it was very painful to program the GPU, and many general purpose computing even needed the help of the GRAPHICS computing API to complete the “curve to save the country”. To crack that market, Huang made a crucial bet. He appointed David Kirk as chief scientist and secretly launched a project called CUDA to build a general-purpose parallel computing architecture that would make n-cards more than just graphics processing chips. CUDA, Compute Unified Device Architecture, is what really gives N cards the ability to be used for parallel computing programming outside of gaming.

Now, almost become the CV or other research in fact CUDA deep learning direction necessary one of the standard library, such as reduced the general research er threshold, make its released from complex hardware tuning parameter optimization and devoted herself to the algorithm, the GPU (and dig currency) in the field of artificial intelligence potential is gradually known by cognitive, Mining.

Other references

CPU structure and function (A) _time-space blog-csDN blog_CPU structure

[Note] CPU structure and function (II) _csDN blog _CPU structure and function

Summary of some basic knowledge about CPU – Horse Jinlong – Blog Park (CNblogs.com)

How does the CPU access memory? – zhihu (zhihu.com)

OpenCL and CUDA, CPU and GPU_Bin column -CSDN blog _Opencl and CUDA Rendering speed

What’s the difference between a CPU and a GPU? – zhihu (zhihu.com)

What’s the difference between CUDA and OpenCL? – zhihu (zhihu.com)

CPU or GPU, who is the pupil? – zhihu (zhihu.com)

What is a GPU? What’s the difference with CPU? Finally someone has made it clear – Zhihu (Zhihu.com)