Abstract: In order to solve the problems that AI engineers encounter in developing AI application scenarios, NAIE platform implements AutoML framework to help them solve AI development problems more efficiently and quickly.

Are you still bothered by the lack of AI algorithms you know?

Are you still on the fence about a course of action?

Are you still looking around for help with tuning?

Are you outraged by the super-parameter-optimized turtle speed?

Are you still stuck trying to continuously refine a model?

From now on, with huawei NAIE AutoML, none of that matters anymore!

To solve the problems that AI engineers encounter in developing AI application scenarios, NAIE has launched the AutoML framework to help solve AI development problems more efficiently and quickly.

Follow me, see how we can beat development challenges one by one with NAIE AutoML!

1. Common problems and challenges in AI development

1.1 What Pipeline to choose?

A complete machine learning application development mainly includes key modules such as data preprocessing, feature engineering, model selection and hyperparameter optimization. Each key module has many sub-modules, as shown in the figure below.

There are many different methods in each module. For a specific data set, a pipeLine is formed by selecting the methods of each sub-module and splicing them together according to certain logic. A model can be obtained by running the whole pipeLine. However, the magnitude of such pipelines are mostly above millions. How do we choose the best one from these pipelines?

1.2 How to quickly verify the feasibility of the AI algorithm?

During the pre-research phase of an AI scenario, algorithm engineers have a lot of ideas. For each idea, quick validation is needed. Do you keep visiting forums to learn how to call and call algorithms because you don’t know an algorithm? Have you ever been in the “no sooner have you learned an algorithm than the other team has already landed” situation? Have you ever experienced “1000 overparameter iterations take 3 days to know the result”? We can’t stand the high bar of entry, the alchemy of tuning, and the long wait for iteration. For a new AI algorithm engineer, how to quickly verify the feasibility of AI algorithm is a very urgent problem.

1.3 How to conduct continuous tuning learning?

In AI tuning, due to limited time and computing power, we cannot go through all possible tuning methods. AI engineers usually set up a few pipelines based on experience, and then adjust the method or overparameter of each module in the pipeline based on the results of the pipeline. Such manual adjustments are time-consuming and laborious, usually only a dozen attempts a day. For massive search space, it is basically looking for a needle in a haystack. Coupled with the limited energy and time available to experts, continuous tuning learning for a particular task can be very challenging.

1.4 How to reproduce the results

In the process of AI application development, have you occasionally turned in a good model to smug, only to be presented to the leader of the results can not be replicated embarrassed. To avoid this awkward situation, we need to make the results repeatable in each experiment. Normally, we would set a random seed to determine. After setting a random seed, is the result necessarily reproducible? No, some algorithms are multi-threaded, such as LightgBM, and the number of threads also affects the result of the algorithm. In addition, the partitioning of cross-validation data and the selection of model-based overparameters are stochastic. How to make the results reproducible is also a big challenge for AI engineers.

So how does Huawei NAIE AutoML overcome these challenges one by one?

2. Introduction to NAIE platform AutoML

AutoML (Automatic Machine Learning) is an Automatic Machine Learning analysis system. It enables ordinary developers and business personnel to participate in Machine Learning modeling, and at the same time, it can liberate data scientists from tedious and repeated algorithm tuning, reduce the threshold of Machine Learning, and improve work efficiency. The fundamental reason why AutoML can achieve such efficacy is that it automates data preprocessing, feature engineering, algorithm model, integration learning and other experiential work in machine learning to improve development efficiency.

AutoML technology, the NAIE platform, will be introduced below.

2.1 NAIEAutoML architecture

NAIE platform AutoML adopts the classic AutoML framework in the industry, which mainly includes five modules: data preprocessing, feature engineering, algorithm model, hyperparameter optimization and integrated learning. The hyperparameter optimization module is used to optimize the pipeline composed of data preprocessing, feature engineering and algorithm model. The main frame diagram is as follows:

Huawei NAIEAutoML was designed with the following two aspects in mind:

1. For ordinary AI developers, NAIEAutoML can be called to handle most business scenario problems;

2. For professional AI developers, NAIEAutoML provides a highly extensible interface for users to customize relevant modules for business scenarios to solve corresponding business problems.

In the actual combat of AI application, various strategies need to be constantly tried, such as increasing the number of iterations of optimization and changing evaluation indicators. AutoML NAIE platform framework based on the Pipeline of parameter optimization, the continuous ultra parameter optimization, distributed parameter optimization to accelerate, extensible and can emersion characteristics, enables users to quickly test super ultra and the number of iterations, the custom algorithm module, retrieval for business problems existing exploration results, significantly improve the efficiency of the development of the user.

2.1.1 Powerful hyperparameter optimization engine

1 support pipeline optimization of overparameter search

NAIE AutoML not only supports over-parametric optimization of pipelines consisting of data preprocessing, feature engineering, and models, but also supports over-parametric optimization of models by turning off data preprocessing and feature engineering.

2 Support distributed parallel acceleration

The industry’s use of AutoML technology typically requires 2,000 or more iterations due to the large parameter space. A single node, running 2000 overparameters, very time consuming. NAIE platform AutoML can adopt multi-node parallelism technology and greatly reduce the time through master-worker mechanism.

Figure: Diagram of distributed implementation

3 Support continuous learning of overparameters

In a real scenario, we don’t know whether the termination conditions are correct or not, so we can only continuously verify the test data. While the effectiveness of validation increases significantly with the number of iterations, users still want to iterate more times. To save resources and time, NAIE AutoML implements incremental hyperparameter optimization and is able to achieve 100+50=150, i.e. 100 iterations for the first run and 50 incremental iterations based on the first task, resulting in results consistent with 150 iterations for a single run.

2.1.2 Integrated learning

Different data mining algorithms have corresponding applicable conditions, and not all of them can be applied to all scenarios and data. NAIE AutoML integrates multiple algorithms to obtain the best model through integrated learning technology, making the final model more robust. The specific implementation process is as follows:

2.1.3 extensible

1. Customize the algorithm model

AutoML typically provides built-in algorithm models for different tasks, and the built-in algorithms can specify several algorithms to participate in the modeling. However, several built-in algorithms cannot cover all application scenarios, so NAIE AutoML provides the ability to customize the algorithm model. Users can develop corresponding interfaces to implement the customized algorithm model according to the framework specifications.

2. Customize evaluation indicators

AutoML provides built-in evaluation metrics for different tasks, such as precision, Recall, and F1 for classification problems. However, these evaluation indicators are not sufficient for many service scenarios. For example, when the False Alarm Rate is less than = 0.1%, the Fault Detection Rate should be as high as possible. Scenarios such as device failure detection require customized evaluation indicators based on business problems. NAIE AutoML provides a customized evaluation indicator interface.

3. Customize cross authentication

Cross-validation is built into AutoML, but it doesn’t cover every user’s needs. Therefore, NAIEAutoML also provides a custom cross validation interface.

2.1.4 can emersion

In AutoML, the selection of superparameters, the generation of proxy model and the training of model are all influenced by random seeds. NAIE AutoML controls all modules involving randomization using uniform random seed parameters. In addition, when the random seed is set, we will automatically set the number of threads that affect the results of the algorithm to 1. This allows the NAIE AutoML experiment to be repeatable, with identical AutoML configurations running at different points in time.

The following uses the device Fault Detection scenario as an example to introduce the application effect of NAIEAutoML.

3. Application of NAIE AutoML in device fault detection scenarios

3.1 Service Scenarios

Network device faults often occur and are detected after the faults occur, which greatly affects o&M efficiency and cost. Traditionally, when a fault occurs, a large amount of manpower and material resources are required to locate the fault and restore services.

How to use AI technology to predict the time point of failure and take action in advance? In view of this business problem, the business department proposed the following business objectives: FDR should be as large as possible under FAR<=0.1%, where,

· FDR=False Detection Rate, that is, the fault actually occurs and the model predicts that the fault occurs;

· FAR=Fault Alarm Rate, false Alarm Rate, that is, the Fault does not actually occur, but the model predicts that the Fault occurs.

In short, try not to miss faults while keeping the false positive rate low.

3.2 Transformation of business objectives

For the application of device fault detection, the AI algorithm is a dichotomous problem. If fault cases are regarded as Positive samples and other cases as negative samples, then the business indicators FDR and FAR correspond to the True Positive Rate and False Positive Rate in the ROC curve of dichotomous classification, as shown in the following figure.

Figure: The corresponding relationship between business target FDR and FAR and ROC curves

According to the figure above, the expression of business target can be obtained, and the code is as follows:

3.3 Specific application based on NAIE AutoML

3.3.1 Particularity of device Fault detection Scenarios

Different from precision, Recall, f1 and other classification evaluation indexes in classification problems, the business objectives of equipment fault detection scenarios are obtained based on business problems. To ensure that the goals of hyperparameter optimization are aligned with the business goals, you can use the registered custom evaluation metrics provided by NAIE AutoML.

3.3.2 Invocation of minimalist code

Call steps:

Step1: initialize the NAIE AutoML class

Step2: Register custom evaluation index FDR_score

Step3: Conduct training

The code is as follows:

Brief description of parameters:

1. Optimization_method: hyper-parameter optimization method, which currently supports grid search, random search and SMAC optimization algorithms

2. Included_models: Defaults to None to search for all built-in models. With this configuration parameter, only some models can be searched.

3. Metrics: Built-in or custom metrics.

4. Workers: indicates the number of parallel workers. This parameter is used to realize distributed acceleration.

5. Random_state: random seed. Specify this configuration to achieve repeatability of the AutoML process.

3.3.3 effect

After a few simple lines of code can be realized in the equipment fault detection scene modeling. Experiments have shown that expertise can be achieved after 1000 iterations.

To do a good job, he must sharpen his tools. AutoML is a must-have weapon for AI beginners and professional developers alike. NAIE has made AutoML available for free for new users for 3 months! Flash sale!

This article is shared from Huawei Cloud community “Guide! Help improve the efficiency of AI development, Huawei AutoML tools to break through development problems one by one!” , by IMaster-Naie.

Click to follow, the first time to learn about Huawei cloud fresh technology ~