At present, computer vision is facing several major problems:

1. Large computing power and big data are required for training, resulting in high cost.

2. Single application scenario, currently only used in unmanned driving, security monitoring and a few other fields.

3. For CV people, the internal volume is serious and the market is close to saturation.

Many cattle from the technical point of view about the future direction of computer vision, benefit a lot. Combined with another question I saw a few days ago, why many people are not optimistic about Sensetime, here, I would like to talk about my view on the future trend of computer vision from the perspective of a society.

The rapid development of science and technology is the driving force behind these changes.

My view of technology is that technology must serve people, be beneficial to society and promote social progress, and it is enterprises that are responsible for realizing this. Only by applying technology and turning it into products can enterprises serve human beings and society.

So why I’m not bullish on Sensetime is because I don’t think it’s done.

As a technical person, I usually read more things, not only technology, products are also more concerned. We often see shang Tang Kuangshi and what papers, but we almost never see them develop what new products. I was under the impression that they were like a research institute.

However, the world has so many university laboratories for academic research, is there still a shang Tang?

In my opinion, this is the same as Gree claiming to make Gree mobile phone. Apple, Xiaomi, Huawei, OV and so on have done so many good mobile phones on the market, is there still a Gree mobile phone?

Therefore, if Gree wants to make mobile phones, it is best to make its own operating system or chips, so as to do something beneficial to the society and the market. But Gree obviously has no intention of this, AND I have been indifferent to it since it said it would make a mobile phone. Four or five years have passed, and so far, it’s true.

Enterprises should still do what enterprises should do, which is to turn academic research into products.

There is a passage in “Will Huawei be the next to fall” :

“We must be businessmen. Scientists can study nothing but a single feather on a spider’s leg for the rest of their lives. For scientists, that’s fine. But what about us? We only study spider legs. Who gives us food? Therefore, we should not only study spider legs, but also customer needs. This was in 2002, when The lion Lucent Technologies was about to collapse and the lion MOTOROLA was sick… Bell LABS, the core resource of Lucent Technology, is good at studying “spider legs”, “butterfly wings” and “function of horse tail”, etc. It is not only the growth booster of Lucent Technology, but also the burden of Lucent Technology. MOTOROLA invested heavily in iridium, but its cutting-edge technology was its downfall… Both firms, and many other “lions”, suffer from eutrophication of capital and technology, so they end up weighed down and cursed by superior resources.

Sensetime is very similar to Lucent and MOTOROLA in this respect. He is obsessed with academic research and does not focus on product development, ignoring that the ultimate purpose of technology is to serve people and the society.

The relationship between technology and product is like the relationship between hardware and software. When hardware gets to a certain level, if software can’t keep up, then hardware is useless. When the software develops, it is limited by the hardware level. To further improve the software, the hardware level needs to be improved.

From the current environment, academic research has reached a certain degree, but there are few applied fields and products, thus leading to the saturation of computer vision.

As applications are developed and products are produced, saturation becomes temporary. It will further promote the development of academic research in terms of products and markets.

What are the untapped areas and products?

According to He’s way of thinking, our application of computer vision is still limited to the present. Because it requires huge data set training and expensive computing power, our imagination of its application scenarios and products is limited.

The cost of back-computing power is reduced, the problem of insufficient data is alleviated, and we will find that computer vision can be applied to a lot of scenarios.

From my point of view, it will certainly be combined with robots in the future. The robots I mean are not just humanoid robots, but mainly various intelligent equipment, such as scene monitoring, service robots, unmanned driving, medical equipment, embedded equipment, etc.

Think of a lot of science fiction movies where a robot can make various analyses of an environment and then act accordingly. Of course, it’s a scary thing for humans to give robots these cinematic abilities, and for now, it’s unrealistic.

This figure is from the network

However, there is a small range of capabilities that we can give.

There are already some applications that analyze the monitoring and automatically alert the police when they detect unusual events such as car accidents, fires, shootings, and falls of elderly people in nursing homes. Vision on unmanned vehicles. Patrol robot.

Jingdong launched a pig face recognition project to test the health status of pigs, while Stanford detected human feces to judge people’s health status.

This figure is from the network

In the future, there can be the following applications:

Carry on the comprehensive scan to the second-hand car, identify the model, give the old and new degree analysis, so as to give the corresponding quotation.

Scan the face, analyze the skin condition of the face, give suitable skin care plan. For hair scanning, recommend the corresponding hair care products and hair care program.

Real-time monitoring of farmland, reminding farmers of the current situation of farmland, such as insects, crop growth, analysis of the previous year’s climate of the region, to give the optimal management plan for the region’s farmland.

Home service robots scan coffee tables and sofas, automatically clean and put things where they belong.

Learn to dance, take the teacher’s dancing input as a template, analyze the students’ dancing video, score and point out the situation of each part, just like the national karaoke, so as to guide students to correct their own problems.

… …

There are many, many applications, and I’m giving you very detailed and specific ideas.

My personal experience is limited and my vision is not comprehensive, but I believe that there are many things in society that can be solved visually.

All of these things need to have a common foundation, which is that vision can be used for mobile devices, embedded devices.

This figure is from the network

From my understanding, the hardware level will develop greatly in the future, and dedicated processors for deep learning will have better performance. Therefore, the future application scenarios of computer vision are very wide, and we need to design a small, specific model for each specific application scenario that can be used in embedded devices. Model miniaturization, lightweight, real-time detection.

Sensetime and Kuangshi, as one of the four AI dragons and as an enterprise, are responsible for the implementation of technology application. No matter in terms of scale, talent supply and capital, they are fully capable of combining computer vision with robots, mobile terminals and embedded devices to develop new fields and develop new products.

For the current Internet enterprises, AI four dragons are also the most suitable for this matter.