Reporter | pigeons



Sensetime recently completed a $410 million Series B funding round, the largest ever in ai.

According to insiders in the industry, on the one hand, sensetime’s algorithm is in an absolutely leading position in the whole industry. On the other hand, sensetime’s HPC department is very strong and has made many breakthroughs in recent years.

Because of such a breakthrough, investors believe that Sensetime is fully capable of supporting its various businesses in the road of commercialization in the future and making steady progress.

HPC, also known as High Performance Computing in English, is a very niche Computing field that has gradually come into people’s eyes with the rise of deep learning in the past one or two years. Since deep learning has a very strong demand for computing power, general computing can not meet the requirements completely, and only high-performance computing can provide lasting and stable guarantee for it.

While Sensetime’s HPC unit is well known in the industry, it rarely speaks out and is considered secretive.

From the available information on the Internet, we can learn that the HPC department of Sensetime provides the overall infrastructure from training platform to deployment platform for each business, and is responsible for comprehensively improving the computing speed of each product system of Sensetime.

On September 28, 2017, Professor Lin dahua personally showed The Sensetime Parrots training platform at NVIDIA GTC China, and according to industry sources, The deployment platform PPL of Sensetime is more than enough to beat its Google equivalent.

In order to find out what is going on with Sensetime HPC, and to confirm the rumors in the industry, AI Technology Camp conducted an exclusive interview with liu Wenzhi (Feng Chen), its head. Hope through this conversation, know more inside information.

                                                   

                                                                       
The picture shows Sensetime HPC Fengchen

The following is a transcript of the interview, which has been slightly abridged and revised on the basis of the original intention.


AI technology base: Does sensetime HPC pay more attention to software or hardware?

Fengchen: Sensetime pays more attention to the combination of software and hardware.

As an HPC, the hardware is the chip. For Sensetime, as a startup company, we are not suitable for such work as chips, but for our business, we actually need to consider how to bring the capabilities of chips into play. For this, we have an HPC team dedicated to on-chip performance optimization.

AI Tech Base: How much performance will be improved after optimization?

Feng Chen: If we compare ourselves to the leading companies in this field, such as Facebook, Google, Intel, Qualcomm, we can improve by two or three times. As for other ordinary companies, we can optimize the performance estimated to reach 10 times, or even more than 10 times.

For example, Google’s TensorFlow also supports mobile phones, and FaceBook’s Caffe 2. We’ve been doing this much faster than they did a year ago. But we don’t talk about it much. The other thing is that it’s really hard to tell how much faster you are than anyone else unless you throw the code out.

Of course, the comparison may be unfair to them. Why do you say so? Google, for example, will certainly optimize their business based on their open source framework, TensorFlow, but the framework won’t optimize anyone else’s business. Similarly, Sensetime’s business will only be optimized for sensetime’s business. As a result, it may not be fair for us to compare our optimized frameworks with their open source ones.

AI Tech Base: In this case, will Sensetang open source?

Fengchen: We don’t have the ability to open source now. Because after open source, the amount of manpower and resources needed will grow exponentially. For example, if someone brings up a BUG, should we fix it or not? If you fix it, we need someone. If you don’t fix it, people think you’re irresponsible. Therefore, at this stage, we choose not to open source. In the future, when Sensetang reaches a certain scale, we will consider open source.

AI Technology Base camp: About software optimization, is it mainly low-level, or specific application?

Fengchen: We do both.

  • On the one hand, for Sensetime, a lot of its business is based on deep learning. Since it is based on deep learning, it has a lot in common. In this area, we actually do the combination of software and hardware optimization based on the platform. This is a low-level optimization, which can be used by different businesses.

  • On the other hand, based on different business, and requires us to partial application of special processing.

In general, in the short term, we will focus on the optimization of the underlying platform. Because doing this once can help many businesses at the same time. In the long run, we will also be involved in the optimization of the upper layers of each business, and cooperate with the capabilities of each product line to achieve the ultimate product performance.

AI Tech Base: Sensetime on HPC, what is the big strategy? What specific points will you focus on next? Can you share with us the latest progress made?

The wind Chen”:The strategy is twofold:

At present, the performance of PPL on ARM CPU in sensetime application has far exceeded the known DACHang open source solutions; The performance of Qualcomm Gpus is also better than that of hardware manufacturers, and it is already used in the products of Chinese mobile phone giants.

As for the latest development, I can only say that we are working on a deep learning solution on FPGA, and the related product is expected to be released in the next year.

AI Tech Base: do you work on MPI or various compilers?

The wind Chen”:MPI and other tools are mostly used in distributed or integrated environments. There are a few people who do communication components themselves, but it doesn’t make much sense or value because MPI is good enough on its own.

There are advantages, though, to building your own special compilers for specific business requirements. But in most cases, doing it yourself isn’t necessarily an advantage.

There are two things to consider:

  1. How much of an impact does this compiler have on the business?

  1. Can you make the compiler generic enough? If it’s not general enough, then I can just write a program to solve the problem. If it is general enough, then the compiler can handle flexibility.

Generally speaking, in view of this point, mainly depends on the specific business, different business, the choice is also different.

At present, the whole industry, direct use of MPI is more common, few companies will go to their own wheel to rebuild again, too much trouble. Besides, it is very difficult for someone to reach this level, even if they can write the program, can guarantee that there will be no problems, guarantee that there is a real sense of help to the business? Probably not.

AI Tech Base: Indeed, as you said, the industry is looking for maximum utility, versatility, and flexibility in the use of tools. There’s a big difference between industry and academia. Academic innovation, industry stability. Even when academics publish the latest research, the latest algorithms are not necessarily better for industry in terms of efficiency, performance, scalability, flexibility and so on. In your daily work, are there many situations where the old algorithm is more suitable than the new one?

Fengchen: This is a common situation. Research and industry, the two sets of thinking is indeed different. Research tends to focus more on breakthroughs, on the best that can be done in a given situation. Industry, on the other hand, is usually more concerned with the worst outcomes, or at least the average outcomes, in real-world applications.

Therefore, the difference in evaluation criteria between the two results in different perceptions of the best and latest algorithms. The best algorithm academically may, in practice, achieve average results, or worst results, that are no better than the results achieved by previous algorithms.

AI Tech Base: We know that you are best at heterogeneous parallel computing, and you have a lot of experience in how heterogeneous parallel computing can be applied to specific industrial needs, and you have written four books on the subject: “Parallel Algorithm Design and Performance Optimization”, “Parallel Programming Methods and Optimization Practice”, “Scientific Computing and Enterprise Applied Parallel Optimization”, “OpenCL Heterogeneous Parallel Computing”, what are these books concerned with respectively, which one are you most satisfied with?

Fengchen: I am most satisfied with my first book “Parallel Algorithm Design and Performance Optimization”.

This book is kind of the basis for a couple of other books, and this book teaches other basic things. When you master the Tao, you learn things about magic very quickly.

This book can really improve one’s understanding of the field, while the next three books are more about solving practical problems in a specific field.

For example, the second book, Parallel Programming Methods and Optimization Practice, introduces what parallel programming languages and tools are available from the perspective of programming languages, and how to use each language. The third book “Scientific computing and parallel optimization of enterprise application” is a concrete practical experience from the perspective of application field. And Chen Yi, Yangtze River cooperation of the fourth book “OpenCL heterogeneous parallel computing” is for heterogeneous computing open operation language OpenCL launched the theoretical introduction and practical training.

For many people, actual combat may be more popular because it can be done quickly.

For example, if the maximum height you can reach is 100 and your current ability is 60, then the last three books will quickly increase your problem solving ability from 60 to 70 and 80. However, this kind of study will only bring you closer to the current ceiling of ability, not raise your ceiling.

If you want to break through the ceiling of 100 points, raise the height to 200 points, 300 points, and make yourself have more room to go up, you must sink your heart into learning the core of the “Tao” level of thought.

Generally speaking, most Chinese engineers today prefer actual combat.

AI Tech Base: Are there relatively few books about tao on the market right now?

Fengchen: Indeed, many people have written about art, but few people have written about tao.

Of course, some professors write about tao, but they tend to write about Tao in computer theory. This is great for research, but not for “how to maximize the performance of a program.” Because academics are more concerned with the overall structure of knowledge, the system of knowledge, and so on, but a little bit detached from the practical ideas that industry can really use.

AI Tech Base: What kind of books do you like to read?

Fengchen: Most of the books I read now are more focused on hardware architecture, focusing on algorithms and design. A little bit of an old, classic theory book. It is the essence of technology that has been proven valuable by history.

I will pay attention to the guiding ideology and research method behind his research, and think about how to translate these ideas into industrial practice.

Let’s take a simple example.

I’ll look at the theoretical basis of quicksort, the details of the algorithm, and think about what characteristics the hardware needs to meet when the algorithm runs on specific hardware, what basic rules the processor design needs to follow, and what rules make it faster.

If you want to enhance your thinking ability, you can take a look at “The Art of Computer Programming”, but this book is too powerful, it is also good to browse; In terms of hardware, Parallel Computer Design is recommended. This book introduces the factors considered in the Design of mainstream Computer hardware. There is also a set of books on parallel algorithm design by Academician Chen Guoliang in China.

On heterogeneous parallel computing

AI Technology Base: How to understand heterogeneous parallel computing in a simple way?

Fengchen: Let me give you a visual example.

People can run multiple threads at the same time. For example, they can watch TV while eating with their hands and mouth. For example, when you step on a rock and fall down, your body will balance without your awareness. This is because we have brains for conscious calculations, and cerebellum, spine, etc., for unconscious calculations.

Humans will classify tasks, and different tasks will be controlled by different places, so that humans can adapt to the complex environment.

In contrast to humans, computer programming has always been serial, with most programs consisting of only one process or thread.

However, in the past few years, CPU frequency has not increased much.

On the other hand, the demand for computing performance is growing all the time. How do you adapt to this demand? So there are two methods: parallel computing and heterogeneous computing.

Parallel computing refers to the original single core to four cores, eight cores to calculate; Heterogeneous computing, on the other hand, is a mash-up, where tasks are divided so that tasks suitable for massively parallel computing are handed over to parallel computing and tasks suitable for single-threading are assigned to single-threading, that is, computing resources of the hardware are allocated on demand. Just like the human brain, cerebellum, spine.

For example, in the security field, the computing part of the neural network is carried out on the GPU, while the logical processing is carried out on the CPU.

In essence, both heterogeneous computing and multi-core parallelism aim to maximize computing power, which is collectively referred to as heterogeneous parallel computing.

About HPC talent vacancy

AI Tech Base: what are the higher requirements for programmers to do heterogeneous parallel computing compared to the original single-threaded computing?

The wind Chen”:It’s definitely more demanding. Heterogeneous parallelism is very difficult.

There needs to be a fundamental shift in the mindset of programmers. It used to be based on a single-threaded mindset, and now it’s going to be multi-threaded. How to allocate tasks to each processor well, how to make good use of the hardware resources on each processor, this is difficult.

AI Technology Base camp: Can we keep up with the talent cultivation in this area?

The wind Chen”:The concept of heterogeneity has only recently been taken seriously, and schools have not had time to cultivate such talents.

Although some colleges and universities will open relevant elective courses, but the teacher has been to stay in the school, itself does not have too much actual combat experience, also did not experience from the industry beat, so, in general, the training effect is not very obvious.

However, with the rise of deep learning in recent years, the industry has increasingly strong demand for talents, and the cultivation of schools not only fails to keep up, but also the gap is getting bigger and bigger. Almost all companies complain that there is not enough talent in this field. Sensetime has a reputation in the HPC field, and it’s hard for us to find the right talent, let alone other companies. Slightly better people, hard to find, are scrambling for them.

If we want to solve the problem of talent, the best way is to first cultivate a group of very good teachers, and then try to make as many students as possible to touch and access this part of the teacher resources.

AI Technology Base camp: is there anyone you like in the domestic HPC field?

Fengchen: Almost not. I don’t find many people in China who can really achieve a very high level in this field. It shows in many aspects: vision, determination, execution ability.

AI Tech Base: Is there a big gap between China and foreign countries in the field of HPC?

Fengchen: We already lead the world in hardware. Because China has money, it can buy the best hardware, but in software, it can not be piled up with money, especially talent, which has a training cycle.

How does Sensetang recruit

AI Tech Base: For talents, do you prefer fresh graduates or those with several years of working experience?

The wind Chen”:Basically only fresh graduates, because of work experience, and do very good talent, too few too few. For fresh graduates, we generally prefer to come from Shanghai Jiaotong University, University of Chinese Academy of Sciences, Tsinghua University, Beihang University, Beijing University of Posts and Telecommunications, which are among the top 10 universities in Computer science in China.

AI Tech Base: these graduates enter the HPC field, how high salary can be obtained?

Fengchen: It’s hard to say, but generally taller than other engineers. Generally speaking, those with a master’s degree, or very top undergraduate degrees, can earn an average annual salary of around 300,000 yuan. More PHDS.

AI Technology Base
:
What are your hiring criteria?

Wind Chen: Shang Tang in the use of personnel standards, the whole should be the industry’s highest. In the field of HPC, I will look at the following aspects:

  1. How much you love the industry

  2. Ability to solve general practical problems

  3. Overall understanding of computer architecture

  4. Data processing power

In addition, I attach great importance to whether the person has a good problem-solving mindset. Because that determines how fast a person can grow and how high they have the potential to go in the future.

AI Tech Base: How to test this thinking ability?

Fengchen: The simplest way is to throw a problem to him and see how he thinks, how to solve an unfamiliar problem, how to write code.

AI Tech Base: Have you ever met a student who impressed you in the interview so far?

Fengchen: Basically not at present, but some of the students interviewed this year have approached, I believe there will be soon.

This is actually caused by the imperfection of the entire training system. Now EVEN I myself, also constantly encounter problems, thinking about problems, and then to build their own knowledge system. The professors in the school, far from the actual situation, actually do not have the ability to build this training system.

HPC talent’s career growth

AI Tech Base: How long does it take for a graduate to become a good HPC talent?

Fengchen: Generally speaking, it takes 5-6 years for a bachelor’s degree, 3 years for a master’s degree and 1-2 years for a doctor’s degree. But talent will grow faster and faster.

AI Technology Base camp: If you join sensetime HPC Department, what aspects will you help talents grow?

The wind Chen”
:There are several aspects.

  1. Further help him to improve the theoretical basis of the computer architecture, such as how the computer does calculation, which stages a calculation will go through from the instruction to the final execution and writing back to the process, and how to deal with each stage;

  2. Teach him how to design a good parallel algorithm;

  3. Give him the usual software engineer skills;

  4. Training him in how to program, how to write high-quality programs;

  5. Develop the ability to solve all kinds of practical problems, whether in heterogeneous parallel algorithms or hardware problems.

AI Technology Base
: What’s the biggest headache now?

The wind Chen”:Somebody, somebody, somebody! Find the right person!

AI Tech Base: This is a headache for the whole industry. (smile)

The wind Chen”:That’s why there’s this fight. (smile)





Attachment: Brief introduction of Fengchen:

  • He graduated from graduate School of Chinese Academy of Sciences with a master’s degree. Now he is in charge of heterogeneous parallel computing department of SenseTime, responsible for code performance optimization, chipping and autonomous driving business.

  • He worked as a parallel computing engineer at Nvidia from 2011 to 2014. Later, I worked as a senior R&D engineer in Baidu Deep Learning Research Institute, responsible for the daily work of heterogeneous computing group. Two US patent applications have been published and a number of domestic patents have been published.

  • He has published four personal books, including “Parallel Algorithm Design and Performance Optimization”, “Parallel Programming Methods and Optimization Practice”, “Scientific Computing and Enterprise Applied Parallel Optimization” and “OpenCL Heterogeneous Parallel Computing”, among which nearly 10,000 copies of “Parallel Algorithm Design and Performance Optimization” have been printed.