Planning to edit | Natalie
Edit | Natalie
AI Front Line introduction:This morning, SenseTime announced the completion of the C+ round of $620 million financing, with the participation of a number of domestic and foreign investment institutions and strategic partners, including Hopu Investment, Silver Lake Investment, Tiger Fund, Fidelity International, etc. Shenzhen Innovation Investment Group, Bank of China Group Investment Co., LTD., Shanghai Free Trade Zone Fund, All-star Investment Fund, etc., qualcomm Venture Capital, Poly Capital, Shimao Group and other strategic investors participated.






Please pay attention to the wechat official account “AI Front”, (ID: AI-front)

To date, Sensetime has raised more than $1.6 billion and is valued at more than $4.5 billion.

Just over a month ago, on April 9, Sensetime announced that it had completed a ROUND C financing of 600 million US dollars, led by Alibaba Group, followed by Singapore sovereign funds Temasek, Suning and other investment institutions and strategic partners.

According to official information, Sensetime has achieved full profitability in 2017, and achieved rapid landing in smart city, smart phone, Internet entertainment, automobile, finance, retail and other industries. Last month, Sensetime released a series of innovative products including SenseMedia video content audit platform, SenseFoundry city-level visual analysis platform, SenseDrive driver monitoring system, among which SenseAR augmented reality platform is the only original AR platform in China. After the C+ round of financing, Sensetime will continue to increase investment in R&D and talent.

Recently, Sensetime has been making continuous moves. It has signed a contract with Shanghai Shentong Metro, the largest subway company in China, and landed on the traffic travel scene. Signed a contract with Chengdu, landed in the Belt and Road regional headquarters, to expand the western market; The Hong Kong Artificial Intelligence Laboratory was jointly established with Alibaba Group and Hong Kong Science and Technology Park, becoming the first project to support the construction of Hong Kong International Science and Innovation Center in response to the call of the central government. Signed strategic cooperation with Massachusetts Institute of Technology (MIT) to jointly promote AI academic research breakthroughs; It has jointly released the world’s first artificial intelligence high school textbook with East China Normal University and other schools, and signed contracts with 40 domestic key high schools such as the High School attached to Tsinghua University and Shanghai Jiao Tong University to open artificial intelligence courses, promoting the landing of AI in the education industry.

Now two successive rounds of financing have set a high amount of financing, Sensetime is going further and further on the road to commercialization. Perhaps many people’s impression of Sensetime is more about dozens of academic papers sweeping CVPR and ICCV, as well as the technical confidence brought by more than 100 doctors. However, Yang Fan, vice president of Sensetime, told us in an interview with AI front earlier, “Only technical barriers, in the long run, it is still making wedding clothes for others.” Let’s review how Yang Fan comments on the key to shang Tang AI landing.

What kind of company is Sensetime?

When it comes to business soup, most people’s first reaction is face recognition, but face recognition is not enough to define business soup.

In Yang fan’s opinion, Sensetime is a platform service provider that adheres to the original AI technology. It uses the original AI technology to provide platform services for different industries and empower each industry, so that AI technology can truly change every industry. “Right now, of course, our work is focused on computer vision for ARTIFICIAL intelligence, which is the area of image and video analysis. There is no doubt that human face, as a very special and highly valuable image identification, will be a very large part of the whole image and video analysis field. But sensetime often provides other solutions for different industries that go far beyond facial recognition.”

The development and breakthrough of computer vision technology
Deep learning enables CV to truly move from academia to industrial applications

Yang Fan has been immersed in the field of computer vision technology for many years. During his tenure at Microsoft, he was mainly engaged in the incubation of new technologies in the field of computer vision and computer graphics, including face recognition, image object recognition, portrait 3D reconstruction, etc. At present, sensetime’s core technology is also based on face recognition, intelligent monitoring, image recognition and so on. As the person in charge of leading technology landing, Yang Fan smiles that he is to help the company’s researchers, but recalled the development of computer vision technology, he said that there is still a great feeling.

In the late 1990s, there was a wave of so-called artificial intelligence, or at least face recognition. At that time, in the laboratory environment, face recognition has been able to achieve a fairly good result, but there is still a big gap from practical application. From 2004, when Yang Fan entered Microsoft as an intern, to 2010 and 2011, the technological progress in computer vision field has been continuous, but it is mainly in the accumulation period. The technological progress of the whole industry is relatively slow, and there are not many new applications and opportunities. By 2011-2012, as computing power on hardware devices improved and companies became able to collect huge amounts of data, deep learning became more practical and changed the industry dramatically, and computer vision has since moved into a particularly fast lane. Computer vision technology spreads from academia to industry, and has more and more extensive applications in all walks of life, which is the external cause.

From the perspective of internal causes, this round of vision technology with deep learning as the core relies more on data, and its core technology research and development ability is improved, and the final results are more universal. “I used to do some face recognition work at Microsoft,” Yang recalled. “Before deep learning, you could make an algorithm that could solve the skin color problem very well, but it might not be able to adapt to the light problem. If you want an algorithm that works well for light, it may not work well for skin, and its breakthroughs are single-point breakthroughs.”

Today, with the application of huge amounts of data, many identification techniques become a relatively general methodology that can be quickly transferred to different fields at a lower cost and in a shorter time, which is of great value. With the development of artificial intelligence technology, although it is still very difficult, its unpredictable and risk have been greatly reduced. In this case, more and more enterprises will be willing to invest in the research and development of these technologies, thus bringing greater value.

Previously, only the world’s top companies would set up research institutes to do core technology research, such as Bell LABS, Microsoft and so on. However, today you will find it is completely different. I believe that the implementation of the whole technology in different industries in the future will bring a great change to the ecology of the whole industry.

Basic research and applied research should not be neglected

There has been criticism in the industry that many companies and developers don’t really know how deep learning works, that they only know the application, but they don’t know why. To this, Yang Fan also has his own view.

Yang fan said that there are two sets of ideas in the academic world. One set of ideas says that knowing what it is and not knowing why it is is deviant and wrong. Yang fan agrees with this idea. In fact, many teams, including Sensetime, have invested in more cutting-edge and basic scientific research. “Such basic scientific research can guide us to go further in the right direction in the future.” However, Yang fan believes that basic research and applied research should not be neglected. A complete scientific system and continuous direction guidance are very important, but empirical science is also very important. Enterprises should ultimately speak with the results of technology implementation.

It is meaningless to talk about the recognition accuracy rate without the scene

In recent years, a lot of companies in face recognition technology invested a lot of RESEARCH and development and achieved bright results, which recognition rate has been the focus of each publicity, this year we can frequently see a variety of 99%, 99.4%, 99.8% in all kinds of reports, how to understand these recognition rate behind the decimal point of the gap?

Technical indicators can not be generalized, any technical indicators are hidden behind a lot of assumptions.

Yang fan listed several examples, such as doing 1 in the financial scene: 1 face recognition to the registration of the Internet financial, it and in the family photo album to do face recognition, which is focus on the same photo to find out the photos, and the security scenario, according to the fuzzy photos in a vast amounts of the escaped prisoner in the library to find a specific person, these scenarios are face recognition, accuracy may be about 99%, or 99% a few. Although the enterprise claims so, the actual difference behind it is very large, and there will be many influencing factors, so the accuracy rate will be strongly correlated with the industry background and presupposition. It is difficult to compare the recognition accuracy in different scenarios.

What is more important is whether an enterprise can truly make a breakthrough with original technology in different scenarios, rather than the accuracy of recognition without knowing the premise. Under the application scenario of Internet photo album, Sensetime can be said to be the first in the world to make computer face recognition beyond human beings, and many subsequent smart photo album businesses and services are derived from this breakthrough. In Yang fan’s opinion, when the company faces a new industry scene that is different from the past and meets new challenges, whether it can take the lead in making quantitative breakthroughs is the most important. When technology deposits, data accumulation, and understanding of business scenarios come together, it can help a company achieve a truly valuable and meaningful technological breakthrough.

When the recognition rate reaches 99%, the main difficulty of face recognition technology lies in how to deepen the technology in different industry scenarios. Although it seems that 99% recognition rate is already very high, but different industry scenarios have different requirements for recognition rate, 99% May just be the entry condition for the technology to be used, such as bank identity authentication service, now shangtang face recognition error rate can be 10 to the minus 7 power, equivalent to 7 bank passwords, But it’s only just been used in this context; In the security scene, blurred photos, occlusion and poor Angle bring more realistic challenges to face recognition.

“Seemingly strong homogeneity, the simple face recognition, segmentation technology scene is very complicated, so out of the scene is not much to talk about technology, you can see today, including security, mobile phones, some of the key industries such as representative, for the real facial recognition technology comprehensively deepen there are so many challenges, is worth us to conquer.”

Image and video analysis is more complex than you might think

Image and video analysis is actually a complex technical system in terms of function or capability. When we implement or deepen a technology, it may require several teams to complete it.

Sensetime’s exploration work in computer vision technology can be roughly divided into image enhancement, object detection and classification, algorithm model, training engine and so on.

Intelligent image enhancement is the first step of image and video analysis. Although today’s photo and video acquisition equipment has been very good, the acquisition of image and video is still often faced with difficulties. For example, infrared cameras and structured light cameras are used to acquire depth map information with loud noise. Or security equipment will be used to shoot objects moving at high speed, which will lead to blur due to motion. Therefore, before analysis, intelligent enhancement and restoration of these images and videos are required, also known as Low Level Vision. This is an independent work in Sensetime, aiming to improve the quality of the collected images and videos.

Image and video recognition and analysis can be subdivided into many parts, including object detection, knowing where an object is; The key point of the object positioning, know the key outline and shape of the object; Object classification, is to find the object, can know what it is; The segmentation of the whole area gives a very clear description of the edges or Outlines of the whole object. In practice, the entire recognition system may need to be divided into several different sub-domains. In real industry applications, it is often a combination of several sub-domains.

Sensetime has teams dedicated to basic research, such as how to miniaturize algorithms to run on resource-constrained mobile devices; How to optimize algorithms to run faster; Continuous upgrade and evolution of AI core training engine or operating system; The research of weakly supervised or unsupervised learning, including reinforcement learning, transfer learning and other cutting-edge technologies.

Yang fan emphasized that from computing engine to data flow architecture, the more important significance is not the amount of data, but the formation of a stable closed loop algorithm.

How does computer vision technology land in real products
The landing scene of computer vision technology in Sensetime

Sensetime has been paying great attention to the implementation of computer vision technology, and Yang Fan has repeatedly mentioned the need to combine technological progress with industrial demand in some earlier sharing and speeches. According to Yang Fan, computer vision technology mainly includes the following application scenarios in Sensetime’s products and businesses:

  1. security

    In the past, the understanding of security is mainly public security. In fact, security in the real sense also includes traffic, offline business scenes, communities, schools and so on, which can cover a very large number of scenes.

  2. Intelligent terminal

    At present, intelligent terminals mainly refer to mobile phones, but its future form may continue to evolve, and the technology of artificial intelligence will definitely show great value on such terminal devices.

  3. Internet video applications

    As the application of the Internet deepens, it will shift more and more from text to images and videos, which are richer forms of multimedia applications. The explosion from live broadcast to short video in recent years is an example. In this regard, Sensetime can provide video application manufacturers with a very complete and rich high value-added solutions.

  4. Portrait authentication

    Portrait based identity authentication is also a very valuable work, it is a special cross-industry solution. This solution has now spread from online to offline to a great extent. For China, the real-name system of individual citizenship information is a very important demand, which can effectively help us to solve the Internet security problem and offline public security problem to a certain extent. All online Internet industry applications, to a variety of offline industries, including airports, supermarkets, hotels, there will be more and more strong demand for personal identity information verification, Sensetime also provides a very complete solution in this respect.

  5. autopilot

    Automatic driving will be a very big benchmark direction in the future, in this process, artificial intelligence technology will be a very key link, Sensetime also has certain input and planning in this field.

The technical support behind sensetime security scene

A qualified security products, not only rely on face recognition behind this one, but by a number of technologies to support.

Take a square level security monitoring scenario as an example, the technologies involved in it mainly include:

  1. Hardware devices, namely cameras. For large squares, one camera cannot cover them completely, so panoramic cameras and stretchable close-range cameras may be needed to complete the collection of faces or other images.

  2. Acquisition algorithm. A crowd analysis algorithm will be integrated into the camera, that is, through the collected data and combined with manual rules, it will learn where the square is crowded and where people stay for a long time, and then let the cameras responsible for capturing and following up focus on these areas.

  3. Face recognition. The next step is to use facial recognition technology in those areas to see if there are people on a blacklist (such as a pickpocket bank) that can be used against pickpockets. That’s why you have to go to areas where people are going to be concentrated, areas where people are going to stay for a long time, because those are high incidence areas.

  4. Body motion capture and recognition. In the process of looking for a specific person, it is necessary to track the human body posture, detect and identify the key movements of these people, so as to judge whether there is theft.

  5. Image enhancement. If the image captured by the camera is blurry, image enhancement is used to make the image more suitable for subsequent analysis.

As Yang Fan said, really to see the landing of the industry, are often the application of different technology overlay and combination, which face recognition and action recognition is the most critical technology, but in fact, want to do a good job in the landing scene, must need a variety of technology combination.

Compound talents are the key to AI landing

Yang fan said that it is not easy to turn innovative technology into actual products, but the biggest difficulty is how to choose the right direction and timing, and how to find the right talent.

The landing of AI technology needs to be combined with the industry, and how to choose the industry to be combined is the first difficult problem. “If the technology is not at the threshold of real success, such as video search in search engines, it may be ok for a big company to keep accumulating it, but if it’s a small startup, it’s hard to get returns and may die after two years,” Mr. Yang said. Yang Fan said, first of all, we need to confirm that the chosen industry market is a real and effective, large-scale rigid demand market; Second, you need to really get complete closed-loop data in the market to make sustained progress. Next, we need to consider whether the current technical red line of the industry is within a reasonable range. There will be problems if we intervene too late or too early. Finally, in the process of product landing, we need to consider how to use the advantages brought by the technical threshold period (usually 1 year to 1 year and a half) to further establish industry barriers, only technical barriers and no industry barriers, and finally in the long run, or for others to do wedding clothes.

On the other hand, the industry landing requires the integration of a variety of comprehensive key technologies. The requirements of the industry are often relatively vague, and technically very unclear, at this time, someone needs to be able to dismantle them one by one. In Yang fan’s opinion, finding or cultivating talents with both technical background and a deep enough understanding of the industry is the most critical point for enterprises to realize the landing of AI technology. He said, “Talent problems, team organization problems, development problems, especially 2B industry, the balance between standardization and non-standard grasp, any technical product landing will face a common problem, do AI technology landing, none of these problems will be less, but only more serious. AI talent is a bigger pit, AI is more technical, and in the past, it is less integrated with the industry, so when you want to really develop a product that meets the needs of the real industry, you need to integrate the understanding of the industry and the understanding of the technology, which in my opinion is the most challenging. Because in the past, there were probably no such people in the world, very few people who understood the industry.”

In the incremental period, Sensetime prefers cooperation rather than competition

In the wave of entrepreneurship in the field of artificial intelligence, computer vision technology (CV) is a very hot direction in China, showing the trend of blooming everywhere. There are plenty of companies competing in security, finance, robotics, healthcare, driverless cars and many other business scenarios.

Security is a very important business scene of Sensetime, and it is also a very important market for many domestic computer vision start-ups (such as Megvii, Yitu, Yuncong, etc.), not to mention Hikvision, which has been deeply engaged in this field for many years.

Yang Fan believes that the security market is currently in a period of rapid growth, from 2018 to 2019, the entire security market will erupt, the outbreak speed may exceed everyone’s imagination. The positioning of Sensetime is to become a capability service platform relying on original technology and an enabler in different industries, which makes sensetime more willing to form a relationship of cooperation rather than competition with upstream and downstream enterprises in the industry.

Face recognition technology security issues

Face recognition technology is mostly used in security and financial fields, especially for face recognition applications related to banking and payment, which have high security requirements. The introduction of FaceID at a recent apple event also sparked debate about whether it was secure enough.

Yang Fan will face recognition security problems are divided into two kinds, one is how to do face recognition more accurate, will not misidentify; The other is how to defend against illegal attacks, such as circumventing facial recognition through photos and videos. With the increase of the amount of data and the iterative evolution of the new algorithm, the accuracy of face recognition has been constantly improving, relatively speaking, the latter problem faces a greater challenge, this problem is also known as the in vivo detection problem in the industry.

For the defense against illegal attacks in financial scenes, Sensetime’s current approach is mainly to accumulate a large amount of attack data, and identify the patterns of attack behaviors through pattern analysis, spectral analysis and other methods, so as to resist these attacks. “Whether it’s video or photos, there are actually a lot of tell-signs that can be seen,” Yang explained. “But these tell-signs are not necessarily distinguishable by humans. Machines can distinguish them better when there is a lot of data, such as reflections from mobile phone screens.”

The main difference of the 3D face recognition technology adopted by Apple FaceID lies in the acquisition equipment. After the acquisition equipment is replaced by a 3D camera, the image data information collected will be larger. Besides color information, there will also be 3D data information, and these depth information can make the algorithm better analyze. So as to achieve better face recognition and attack defense effect. Yang fan believes that the R&D and development of 3D acquisition equipment is a relatively clear industry trend, and Sensetime will make some attempts in this direction in the future.

The future of computer vision technology

For the current challenges facing computer vision technology, Yang Fan thinks there are three main points. The first is how to reduce the dependence on data, which is also a general direction that everyone in the industry has reached a consensus. The current image recognition mode is too dependent on data, and human recognition does not need so much data. The second is overall performance optimization, which is how to do intelligent analysis with lower computational cost, which is very important for practical application. The third is theoretical research. It is important to know why, which is more helpful for long-term development.

Yang fan believes that video analysis and understanding is one of the more promising research directions of computer vision in the future. He said, “The analysis and understanding of the video, in fact, people have been Shouting for many years, different people will have different judgment, will invest in different periods. Personally, I think the Internet is an established system chain with particularly large commercial value. In my opinion, the application of video is too little rather than too much. The potential value of video or visual signal is very large, because visual information actually accounts for a very important proportion of communication between people, and its information content is very rich. Today, the Internet has formed a very complete ecology, and it has particularly good basic technical support for the five links of information. In this case, it is necessary to take the lead in exploring and excavating the field of video. Many offline industries may have urgent needs, but there will still be a lot of space for video and images on the Internet, especially in the field of video content analysis and understanding, in fact, there is still too little to do today.”

What is the positioning of computer vision in the overall ai layout?

Vision is the core, and the potential business value is the greatest.

Yang believes that information is the core of everything. Apart from artificial intelligence, the whole IT industry is about information collection, transmission, storage, analysis, calculation and feedback. And artificial intelligence is in the whole information loop, machines are increasingly taking on the role of people, maybe better than people. When people interact with each other in daily life, visual information is more essential information, which contains more information. Therefore, computer vision exists in a relatively high-order form in the whole form of information, and has higher technical requirements for each link. Once the processing ability of visual information is gradually acquired in each link, the value IT will burst out may exceed the space that the IT and Internet industry can influence today, and may even overturn the interaction between people and people and the world.

In Yang fan’s opinion, computer vision has a very important point, that is, the electromagnetic wave that human eyes can analyze and feel is a very narrow band, while the machine can recognize a wider band, such as infrared camera, near-infrared camera, structured light depth camera. Yang raises an interesting question: “These cameras can expand the range of wavelengths that humans can see and process. So this thing can go on and on and on? Understood in this way, computer vision implies a future in which machines can replace humans, or serve as human assistants with more fundamental insights into the world.”

Yang Fan believes that at present, the way we design and use the infrared camera is still based on people, relying on the assistance and guidance of human experience, that is, the image information collected by the infrared camera is converted into an image that can be understood by humans, and then the machine is used to understand it. “The next step,” he says, “is likely to be infrared cameras that pick up patterns of information that the machine can understand, and then the machine can expand.”

Today’s recommendation,