The rapid development of artificial intelligence technology has brought profound influence to all walks of life. AI has been regarded as the only way for enterprises to improve operational efficiency and cope with market competition. However, for some enterprises, it is still a struggle to make AI truly implemented and applied, and create value.

Recently, in TechDay shenzhen station of TechDay technology salon, technology masters from Huawei, TechDay and SheIn conducted in-depth discussion on AI core technology on the scene.


Brief Introduction to AI Tool Chain by Chang Yuefeng

Senior director of Big data research and development




Throughout the process of AI landing in a production environment, there are usually three challenges:

First, the business scenario is complex. A simple algorithm may only optimize a certain link, but the optimization of the whole business scenario may require the cooperation of many algorithms.

Second, data. Data is one of the most important underpinning of AI, and many organizations lack the ability to access high-quality, annotated data.

Third, technical issues. There are four core technical problems in the process of AI landing: 1) The scheduling and management of CPU/GPU environment is complicated. 2) AI business developers need a low-barrier experimental platform that allows them to conduct rapid exploration experiments. 3) Enterprises with large-scale data need industrial-level large-scale distributed training to ensure that the algorithm can be applied to the full amount of data. 4) Enterprises need to provide online services with low latency.

The core of artificial intelligence is data, which can be divided into two parts, real-time data and offline data. Getuan uses Hive solution to store offline data, focusing on data capacity and scalability. Online users are very sensitive to latency, so Twitter will use a high-performance KV library to ensure that online features can be accessed in a timely manner.

After solving the basic data storage and use problems, For the technical problems in the AI landing process, Push internal support end-to-end services, can use the standardized process to quickly carry out practical exploration. Getui also developed some plug-ins and product packages to simplify the process steps and complexity, helping even less experienced developers build systems in a short time. Finally, Getui also supports deployment publishing tools, so that the results of training can be exported online in a standardized way for service deployment, and truly generate value online.

Kubeflow and other open source technology stacks can be used in the practice of AI landing in small and micro enterprises. First, the management and scheduling of the environment can use Kubernates as a distributed environment standard. Jupyter + open source data analysis kit + AI framework can carry out quick exploration experiment with low threshold; Kubeflow + Tensorflow/PyTorch/MXNet can quickly deploy large-scale distributed training; Finally, Kubernates provides the ability to quickly deploy, go live, and scale up to provide highly available online services.

In the actual implementation of AI, enterprises need to pay attention to the following three points:

First, fast and efficient. Open source tools can be used by enterprises to quickly land businesses, while also paying attention to precipitation processes and verticals.

Second, integration gets through. Kubernates solution is not the only option. Enterprises need to consider their own situation, connect with the existing system, and choose a suitable solution.

Third, team building. There needs to be efficient cooperation between various technical departments, and enterprises can also guide r&d engineers to gradually integrate into the FIELD of AI.


Personalized Recommended Chat by Ma Xingguo

Vice General Manager of SheIn Product R&d Center



For enterprises, if they want to do well in the business of AI personalized products, only algorithm engineers are not enough. They also need the support of engineering and data analysts, as well as the help of product and operation personnel.

When an enterprise involves a large number of businesses, it can also carry out general business processing, that is, build a recommendation platform biased towards the system level. The recommendation platform requires the cooperation of data, algorithms and systems. The access of the recommendation platform can bring three functions: first, enterprises can unify the format and synchronize incremental and full quantities when they synchronize materials; Second, the platform can be standardized, high performance and intelligent when handling users’ service requests. Third, the platform can report user behavior in a unified format, in real time and offline.

The simple machine learning process is to build the environment, collect data, analyze data, prepare data, train algorithms, test algorithms, and use algorithms. There are also many hidden problems in this process, such as how to solve the problem of cold start, how to solve the problem of false exposure, how to clean abnormal data, how to select positive and negative samples, how to solve the problem of data sparsity, how to select prominent features from the billion features, etc.

In the process of machine learning, data is the basis, and the ideal state is a large number of data and complete features. There are two ways to collect data: “push” and “pull”. “pull” is crawler, and “push” is report. And the analysis data is to analyze the target distribution, feature distribution, target feature relationship, feature relationship and integrity, etc. Data can be analyzed by offline analysis, real-time analysis and fusion analysis. Analysis tools can be used in Excle, Shell(AWK), Python, Mysql, Hadoop, Spark, Matlab… Choose between them. Cleaning data Requires cleaning system dirty data, service dirty data, and external target data. Data formatting requires data transformation, sampling and sparse processing.

However, machine learning can choose more algorithm models, such as heat, Bayesian, association rules, LR, GBDT, AR, CF(ALS) and so on.

Feature engineering is also a very important part of the algorithm model. The feature object includes material, user and context. Feature types include static feature, dynamic feature, characterization feature, enumeration feature, real number feature, etc. Feature dimension has first order independent feature, second order cross feature and multi – order cross feature. The selection of features is also a matter to be paid attention to. Enterprises can choose from three types of features: filtering, wrapping and embedding. At the same time, enterprises also need to choose from three types of feature processes: forward, backward and StepWise.

At the end of the algorithm, effect evaluation, multi-dimensional evaluation, real-time evaluation and off-line evaluation are also needed. Companies also need to be aware that there is no one-size-fits-all model and that algorithms require constant attention and operation.

The establishment of a suitable environment is also one of the guarantees for the normal operation of the algorithm. The algorithm environment should be standardized, configurable, extensible, and high-performance, and support three-dimensional monitoring and effect improvement to ensure user experience.


AI Recognition, From Images to Faces, by Nie Penghe

Huawei Algorithm Engineer



In the field of computer, there have been attempts in the 1990s to synchronize the characteristics and recognition process of images to the computer through human rules, so that the computer can carry out “image recognition”. Until 2012 and 2013, people found that some small changes in the structure of traditional neural network could greatly improve the operability of computer image recognition, and this improved neural network was called convolutional Neural Network (CNN). The essence of CNN image processing is information extraction, also known as automatic feature engineering, that is, key image features are extracted step by step through huge neural network, so as to achieve the purpose of image recognition.

Face recognition is a biometric identification technology based on facial feature information. Nowadays, face recognition can effectively identify user identity, and is widely used in payment, security check, attendance and other scenes. With the construction of face data system, face recognition will become one of the effective means of anti-fraud, risk control and so on, which can greatly shorten the confirmation time of identity audit.

The biggest advantage of face recognition is that it is non-contact and can be operated stealthily, which makes it suitable for security problems, criminal monitoring and capture applications. At the same time, non-contact information collection is non-invasive and easily accepted by the public. And face recognition is convenient, fast, powerful post-tracking ability, but also in line with human recognition habits. The deficiency of face recognition is that the similarity of different faces is small, and the recognition performance is greatly affected by external conditions.

The steps of face recognition mainly include face detection, face alignment calibration, face feature extraction, face feature model, face feature matching and the output of face recognition results.

Among them, the goal of face detection is to find the image, the corresponding position of the face, the algorithm output is the face of the rectangular coordinates in the image, may also include posture, such as tilt Angle and other information.

The second step of face recognition is face alignment, which needs to be used under the premise of ensuring that the features of the face and other elements do not distort and change. In this case, the distance between the output face and the later model can be compared effectively.

The last step of face recognition is face matching. In the case of large enough network and rich enough samples, the accuracy of face matching will be very high.

In the field of face recognition, the development of deep learning networks will be better and better. Deep learning has its corresponding advantages. It emphasizes the abstraction of data and automatic learning of features, and its autonomous learning feature is more reliable.