1 the significance of application intelligence

By 2021, AI has become a core competitive point for all mobile Internet products. The content platform distributes personalized content to users through recommendation algorithm to improve the usage time; Social platform to carry out relationship map analysis, accurately recommend acquaintances, improve user engagement; Through image processing, the live broadcast platform provides beauty and special effects to improve the live broadcast effect. Behind the screen, there are thousands of machines doing endless calculations. Unquantifiable data shuttles between application and computing centers via electromagnetic waves and network cables, enriching each AI model.

The potential of ARTIFICIAL intelligence goes beyond that. In addition to these “big scenes”, there are many “small scenes” on mobile apps that can be differentiated from other products by using ARTIFICIAL intelligence for a better experience. The execution of the algorithm does not need to completely rely on cloud computing resources, and the end-to-end computing framework can also complete many interesting functions. Such as:

  1. Entry prediction: When entering a multi-tab page, predict the functionality that the user may need and display the page directly.
  2. Popup management: Control the frequency of popups according to users’ usage habits and display more suitable content to users’ needs.
  3. Intelligent preloading: Preloads pages that may be entered, shortening the page loading time and improving user experience.

This paper takes decision tree learning as an example, introduces the principle of decision tree, training method and mobile deployment method, and makes a superficial discussion on mobile machine learning. Throw out bricks to attract jade. I hope you can mention more points.

Keywords: end intelligence, machine learning, decision tree, SciKit-learn, Python

2. Intelligent implementation method

2.1 General process of end intelligence practice

Compared with common service requirements, intelligent services are longer on links. The entire link starts with data. On the one hand, the cloud needs to pull offline data for model training. On the other hand, the client also needs to obtain peer data on the end as the input of end-to-end reasoning. In addition, in order to ensure that the iteration of reasoning ability is not affected by App iteration cycle and coverage speed, the end-side needs to have the ability to dynamically update the model. Ideally, this should be a base ability. After the characteristics and reasoning are processed end-to-end, the reasoning results are used to guide the business shape. Data related to model reasoning (reasoning results, real results, reasoning time, etc.) should be reported to the cloud through buried points for evaluation like business results themselves.

In the process of machine learning and its application, each link contains a large number of scientific principles and engineering practices. Limited by space and the author’s ability, this paper only focuses on the decision tree principle, training and mobile terminal deployment.

2.2 Model selection

There are many ways to implement artificial intelligence, including machine learning (linear segmentation, decision tree, SVM, etc.), deep learning (machine learning, CNN, RNN, etc.) and reinforcement learning, etc. Different scenarios are applicable to different schemes, and the difficulty of training and application is also very different. Here is the Cheat sheet provided by SciKit-Learn for your reference.

In practical application, multiple models (such as SVC/SGD) will be screened for a scene (such as classification scene), and the best model will be selected after initial training, and further parameter adjustment will be made for this model.

3 the decision tree

3.1 Definition of decision tree

This article focuses on a simple and powerful tool called a decision tree. Wikipedia describes the decision tree as follows:

Decision tree learning is a predictive modeling method used in statistics, data mining and machine learning. It uses a decision tree as a prediction model to infer the prediction results of the sample (corresponding leaves of the decision tree) from the observed data of the sample (corresponding branches of the decision tree). Decision tree learning can be divided into two categories according to the difference of predicted results. (1) Classification tree, whose prediction results are limited to a set of discrete values. Each branch of the tree corresponds to a set of logical and jointed classification features, and the leaves on the branch correspond to classification labels that can be predicted by the above features. (2) regression tree, whose prediction results are continuous values (such as real numbers).

In short, decision tree learning is to train a tree classifier to describe the logic of feature and label mapping given a set of sample data, including feature values and corresponding labels. Take the tried-and-true Iris classification dataset: Features (or attributes) include petal length, petal width, calyx length, and calyx width, labeled by Iris species (Iris cerasus, Iris Chameleon, and Iris Virginia). Below is a sample of the dataset (the left column is the sample number, not the data).

The classification decision tree derived from learning this data set may have the following shapes (for illustration only). In a decision tree, the inner node (non-leaf node) represents a feature used for classification (e.g., “petal length”), and the edge under the node represents the judgment condition (e.g., “greater than 2.497”). And the leaf represents a predictive classification. When inferring an iris data of unknown classification, input it into the decision tree: start from the root node, select branches according to the judgment conditions of each node, and finally reach a leaf node, namely the predicted classification.

3.2 Decision tree construction

The above section briefly describes the definition and function of decision tree. This section discusses the construction process of decision tree. The following figure is the pseudo-code description of decision tree construction process in machine Learning by Zhou Zhihua.

The construction of decision tree is a recursive process: input a set of data sets, and if the current data set satisfies the stop condition 1, generate a leaf node and stop the construction process. Otherwise, an optimal partition attribute 2 is selected to split the dataset, an intermediate node and corresponding edges are created, and the construction process is repeated with the resulting subset as input.

As you can see, the constructor has two key points: the stop condition and the optimal partition attribute. First, stop conditions are discussed. Without considering fitting (the definition of overfitting will be described later) and calculating costs, the stop condition can be summarized as:

  1. Data set is empty, or
  2. The samples in the dataset all belong to the same classification, or
  3. All samples in the data set have the same values for all features

Under conditions 1 and 3, the dataset can no longer be segmented. And in condition 2, it doesn’t make sense to continue to divide.

Next we discuss optimal partition properties. The purpose of segmentation is to make the samples in the subset belong to the same category as much as possible, namely the arrival stop condition. Let’s focus first on the category column. The number of categories contained in a data set can be regarded as the amount of information it contains, and the more complex the category is, the more information it contains. In informatics, “information gain” is used to measure information difference: information gain is equal to the difference between the amount of information before and after a data set is divided. The larger the information gain is, the less information the subset contains, that is, the fewer categories the subset contains. Selecting the optimal partition attribute is actually looking for the segmentation condition with the maximum information gain. The specific method is as follows: The feature of the dataset is traversed, the dataset is divided into multiple subsets using the feature, and the information gain obtained by the current partition is calculated and recorded. Finally, the feature with the maximum information gain is selected as the segmentation condition.

Here’s a little extra detail on data partitioning. When the feature values are discrete, the partition method is more intuitive. For example, if the feature is color ={red, yellow, blue}, the dataset can be divided into three subsets containing samples with red, yellow, and blue colors respectively. If the feature value is a continuous value, such as the petal width in iris data set, it needs to be divided into several intervals according to the distribution of the feature value in the data set, and then divided according to the discrete value method. Different tree algorithms have different continuous value segmentation methods, which will not be discussed in depth here.

To recall, subsets are defined using features, and information gains are calculated by looking at categories. Keep this in mind to avoid getting lost in the next section.

3.3 Information gain

This section describes the calculation method of information gain, including some mathematical formulas. The reader may decide whether to skip this section based on the reading objective.

3.3.1 information entropy

In 1948, Shannon introduced the concept of Entropy from thermodynamics into information theory to measure the amount of information. Information entropy describes the uncertainty of data set D when observing “categories”. The more cluttered the sample and the more categories the data set contains, the greater the information entropy is. Then, the more uniform the samples in the dataset, the fewer categories and the smaller information entropy. After the data set is divided into several subsets in a certain way, the difference between the information entropy of the original data set and the total information entropy of the subset is the information gain.

Assuming that a dataset D contains y categories, and the proportion of each category of samples is PK, the information entropy of D is defined as:


E n t ( D ) = k = 1 y p k log 2 p k Ent(D) = -\sum_{k=1}^{y} p_k \log_2p_k

For example, y = 1, Ent(D) = 0. If D is divided according to the discrete property F, and f includes V values, D will be divided into V subsets, denoted as DvD^vDv, whose information entropy is Ent(Dv)Ent(D^ V)Ent(Dv). Considering that each subset contains a different number of samples, a quantitative weight is given to it. The information gain is divided as follows:


G a i n ( D . f ) = E n t ( D ) v = 1 V D v D E n t ( D v ) Gain(D, f) = Ent(D) – \sum_{v=1}^{V}\frac{|D^v|}{|D|}Ent(D^v)

The decision tree algorithm using information entropy to divide attributes is ID3. But entropy has a preference for more properties. To solve this problem, the C4.5 algorithm does not directly use information entropy, but reference gain rate to divide attributes:

Gain_ratio (D, f) = Gain (D, f) (f) Gain \ _ratio IV (D, f) = \ frac {Gain (D, f)} {IV (f)} Gain_ratio (D, f) = (f) Gain IV (D, f), The IV (f) = – 1 v ∣ ∑ Dv ∣ ∣ D ∣ log2 ∣ Dv ∣ ∣ ∣ IV (f) = D – \ sum_ {1} ^ {n} \ frac {^ v | | D} { | D |} log_2 \ frac {^ v | | D} {| D |} IV (f) = – ∑ v ∣ 1 D ∣ ∣ Dv ∣ log2 ∣ D ∣ ∣ Dv ∣

The greater the value of f, the greater the value of IV(f). So IV(f)IV(f)IV(f) division of information gain can offset the preference of information gain to the number of features to a certain extent.

3.3.2 Gini index

Similar to information entropy, the Gini index is another measure of information complexity. Gini index can be understood as: the probability of two randomly selected samples of different categories from dataset D. The smaller the Gini index is, the more uniform the samples in the dataset are. The gini index algorithm is as follows:


G i n i ( D ) = k = 1 y k indicates k p k p k = 1 k = 1 y p k 2 Gini(D) = \sum_{k=1}^{y}\sum_{k’\neq k}^{}p_kp_{k’}=1-\sum_{k=1}^{y}p_k^2

After dividing D using feature F, the sum of the Gini exponents of the subset can be calculated using the following formula. Similarly, due to different sample sizes, the Gini index of subsets needs to be multiplied by weights.


G i n i _ i n d e x ( D . f ) = v = 1 V D v D G i n i ( D v ) Gini\_index(D, f) = \sum_{v=1}^{V}\frac{|D^v|}{|D|}Gini(D^v)

In subsets, the feature with the lowest gini index after subsets is selected. It can be seen that the information gain is not calculated when using the Gini index. The authors (irresponsibly) suggest that this is because the Gini index can be directly used for comparison, and direct comparison of information entropy has the problem of feature number preference mentioned above — requiring calculation of gain rate.

3.4 Performance Evaluation

Now suppose we have constructed a decision tree on the data set, how can we evaluate the performance of this decision tree? This section discusses ways to evaluate the performance of a model.

3.4.1 Training Set, Verification Set, and Test Set

The task of machine learning is to learn patterns of features from data. If we take all the offline data we get and put it directly into the learning algorithm, then all we can do to evaluate the model is reflect its performance on this data set. We have no way of knowing how this model will perform when used in a production environment. To solve this problem, the raw data set needs to be split up, one part for training and one part for testing.

Usually the data set needs to be divided into three parts: training set, validation set, and test set. The training set is used to train the model. The validation set is used to verify whether the model is over-fitting and adjust parameters. When you have the final model you want to release, you can use the test suite to predict how well the model will roll out. A common partition ratio is to extract 70% of the raw data as a training set, 20% as a validation set and 10% as a test set. The division of the data set should ensure equal sampling as far as possible. For example, if the ratio of positive and negative cases in the original data is 8:2, the ratio of positive and negative cases in the three subsets should also be close to 8:2.

3.4.2 Performance Specifications

Confusion matrix

First, define the confusion matrix (based on dichotomies, labels are positive and negative respectively) :

True positive Real is negative
Forecast is positive True Positive False Positive
Predict a negative False Negative True Negative

Error rate and accuracy (accuracy)

The error rate describes the overall error of the prediction results, that is, the percentage of the number of prediction errors in the sample.


E r r o r = ( F N + F P ) present ( T P + F N + F P + T N ) Error = (FN + FP ) \div (TP + FN + FP + TN)

Accuracy is the opposite of error rate:


A c c u r a c y = ( T P + T N ) present ( T P + F N + F P + T N ) Accuracy = (TP + TN) \div (TP + FN + FP + TN)

Easy to see: Error+Accuracy=1Error +Accuracy=1Error +Accuracy=1

Accuracy rate/accuracy

Accuracy describes how well the model can predict positive cases: how much of the predicted positive data is actually positive.


P r e c i s i o n = T P present ( T P + F P ) Precision = TP \div ( TP + FP)

Recall rate/recall rate

Accuracy describes the model’s ability to predict positive cases: how much of the data that is truly positive is successfully found by the model.


R e c a l l = T P present ( T P + F N ) Recall = TP \div (TP + FN)

3.4.3 Evaluation method

As mentioned above, error rate and accuracy are measures of the overall effect of the model, and positive and negative examples are treated equally. When the user is concerned about the model’s predicted performance for all categories, the accuracy can be directly observed. But in the real business, we tend to be more concerned with the predictive power of the model for a certain type of tag. For example, in a preloaded business, the business is more concerned with the model’s prediction of the performance of the “enter page” category, because this prediction results in preloaded actions being performed on the end. In this case, “page entry” can be regarded as positive classification, and the two parameters of precision and recall can be observed.

It is important to understand that the two parameters, precision and recall, conflict to some extent. Models with high precision often have poor recall performance. Imagine the following limiting case: predict all samples positively. This covers all positive cases in the sample and has a recall rate of 100%, but you can expect a very low recall rate. The figure below reflects the relationship between the precision and recall of a certain model on the Iris data set.

In actual application, you can choose these two parameters based on business scenarios. For example, in the spam filtering scenario, the accuracy of the model should be paid special attention to in order not to block normal emails. A comprehensive evaluation can be made by calculating F-Score under the condition of a certain weight:

β is the weight given to RECal, indicating that the importance of recall is β times that of Precision.

In addition to the above parameters, ROC-AUC is often used in machine learning as a parameter to measure the performance of the classification model, which is not described here and can be searched by readers.

3.4.4 Generalization ability/overfit

Definition of overfitting

Machine learning training is the process of model learning the characteristics of training set. Since the training set is only a subset of the real world, it must contain its own characteristics. Overfitting describes the phenomenon that the model overlearns the unique features of the training set. To take an extreme example, suppose our training set consists of five pieces of data and one feature, each of which has a value on the feature. As long as the decision tree generates 5 paths, and each path describes the characteristic value of a data, 100% prediction accuracy can be achieved on the training set. Obviously, such a model would have poor performance when applied to unknown data. The following figure shows the overfitting:

Identify overfitting

As mentioned in Section 3.4.1, raw data sets are typically divided into training sets, validation sets, and test sets. Validation sets are used to assist in judging whether the model is over-fitting. With the deepening of the training process or the increase of the depth of the decision tree, the performance of the decision tree on the training set will be better and better. But if we use this model to predict validation sets, there is a point at which performance growth begins to plateau or even decline. The figure below depicts the score curves of a decision tree on the test and validation sets as the depth increases on the Iris data set. You can see that after 4 tiers, the score on the test set is still rising, but the score on the test set starts to fall. At this point, the decision tree has already started to overfit.

Overfitting is avoided by pruning

Over-fitting can be avoided by pruning: excessive judgment conditions can be avoided by reducing the paths in the decision tree, that is, the features learned by the decision tree can be reduced. Pruning can be divided into pre-pruning and post-pruning. Prepruning means that in the process of decision tree growth, the growth process is stopped if a limit is reached. Post pruning allows the tree to grow to its maximum depth and then removes some nodes in the tree by some algorithms.

The common constraints for pre-pruning include the maximum depth of the tree, the minimum number of samples required to divide nodes and the minimum number of samples required for leaf nodes. Then pruning traverses each leaf node to calculate whether the generalization ability of the decision tree increases or stays the same after pruning the leaf node. Regardless of the rise, even if the observed generalization power is constant, pruning is always good: do not add substance if not necessary.

4. Decision tree training

The previous section discussed the basic principles, performance indicators and overfitting prevention of decision trees. In actual production, we generally do not write a decision tree training algorithm from scratch, just like we do not write a picture loading library or network library on Android. There are many excellent and proven training frameworks available in the industry. This chapter describes how to use the SciKit-Learn framework to train a decision tree to classify penguins.

4.1 Framework Introduction and Construction

4.4.1 Scikit – learn

Scikit-learn is a Python machine learning library based on the Python data processing libraries such as NumPy, SciPy and Matplotlib. Scikit-learn supports many types of machine learning tasks: classification, regression, clustering, etc., and provides data preprocessing capabilities. Scikit-learn is open source, so there is no need to worry about commercial use. Even better, SciKit-Learn has a well-translated Chinese site, complete with Api references and user guides.

4.1.2 Environment Construction

To set up the environment completely manually, you need to install Python (base environment), Pip (Python package management), SciKit-learn, and any other packages you may need. Anaconda is recommended for one-stop configuration of the graphical environment. All dependencies can be installed with a few mouse clicks, and you can easily set up venv virtual environments to avoid conflicts between package dependencies.

In addition to package management, Anaconda also brings direct support for Jupyter Notebook. Jupyter Notebook is a modern, interactive programming environment. Compared to traditional ides such as PyCharm, Jupyter presents context in a more intuitive and efficient way.

If the reader is just experiencing machine learning quickly, or if there are no data security issues involved (using public data), it is highly recommended that you just click on the Google Colabtory. You don’t need to configure any basic environment yourself, you can use it right out of the box, and you can access free GPU resources. The Demo in this article was developed on Colab. Click here to see the Demo.

4.2 Acquisition and processing of training data

For a bit of freshness, the examples in this article do not use the classic dataset Iris, but a similar scale dataset of Palmer Islands penguins. The data set describes beak length, beak width, webbed length, body weight and sex of three penguin species on The Palmer Islands in Antarctica. The Demo in this article will learn about these traits and predict penguin species.


4.2.1 Data pull

Note that penguin datasets are not built into SciKit-Learn and need to be installed separately. The command is as follows:

Add a PIP command to the laptop. ! pip install palmerpenguinsCopy the code

DataFrame (” pandas.DataFrame “). You can use the head function to quickly view the data.

import pandas as pd
from palmerpenguins import load_penguins
penguins = load_penguins()
penguins.head()
Copy the code

4.2.2 Data processing

In machine learning, raw data sets usually need to be processed before they are handed to the model for learning. This is because the raw data may contain some outliers (such as typo), null values, or useless attributes.

As you can see, the Penguin data includes samples with a null value (NaN). The handling of null values varies from model to model, simply removing rows containing null values.

In addition, some attributes are easier to use after mapping. For example, if the original data set is a string like male/female, it is recommended to map the string to a number (such as 0/1). This has two advantages. First, if there is typo in the data, such as “male” being accidentally typed into “mal”, the mapping can be found quickly. Second, parts of the model (including the decision tree) cannot accept input like strings.

In fact, in addition to the format transformation of features, it is often necessary to conduct correlation analysis on features and remove attributes unrelated to tags. Skipping the correlation analysis for the moment, it can be seen from the data alone that the year column describes the time when this data was collected and has nothing to do with the penguin species. So we need to get rid of that.

Pandas provides powerful data processing capabilities that require only a few lines:

Spice_dict = {'Adelie': 0, 'Chinstrap': 1, 'Gentoo': penguins = penguins. Dropna () 2} gender_dict = {'male': 0, 'female': 1} island_dict = {'Torgersen':0, 'Biscoe':1, Replace (spice_dict).replace(gender_dict).replace(island_dict) # discard invalid input Drop ('year', axis=1) penguins.head()Copy the code

The processed data looks like this:

4.2.3 Data set division

As mentioned earlier, the raw data sets need to be divided into training sets, validation sets, and test sets. Scikit-learn provides the ability to quickly split data sets. Here’s how splitting works by setting aside the method, as shown in the code below. It should be added that, due to the potential sample bias caused by splitting, k-fold method is often adopted in practice, that is, test sets are divided into K groups with the same capacity, one group is selected each time as the verification set, and the remaining groups are used as the training set. During verification, K times are traversed to calculate the average score.

# select * from sklearn.model_selection import train_test_split # select * from sklearn.model_selection import train_test_split Train_and_validate, test = train_test_split(penguins, test_size=0.1) Random_state =1) # Train, validate = train_and_VALIDATE (train_AND_validate, test_size=0.3, random_state=1)Copy the code

The following methods can be used to extract features and labels from the Dataframe:

Def getFeatureAndLabelFromDF(dataset): feature_cols = ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex', 'island'] X = dataset.loc[:, feature_cols] y = dataset.species return X, yCopy the code

4.3 Decision tree training

4.3.1 Decision tree fitting

Now we have everything we need to train the decision tree. The code for decision tree fitting is very simple, requiring only the following sentences:

The from sklearn import tree from sklearn. Metrics import f1_score CLF = # create decision tree tree. DecisionTreeClassifier () # separation characteristics and the label, X is the feature, y is the tag tX, ty = getFeatureAndLabelFromDF(train) # Given the feature and tag, the fitting decision tree CLF = clF. fit(tX, TY) # Look at the performance, the calculation method is F-measure, the weight is 1. Py = clf.predict(tX) f1_score(TY, py, average='weighted') === 1.0Copy the code

Wow, the model got a perfect score of 1.0 on the training set. Such a high score immediately puts us on alert: Has the model been fitted? Take a look at the performance of the same model on the verification set:

VX, vy = getFeatureAndLabelFromDF(test) py = clf.predict(vX) f1_score(vy, py, Average ='weighted') === Output == 0.8808654496281271Copy the code

Sure enough, on the validation set, the decision tree didn’t work as well. In most cases, the default parameters of the decision tree do not give us the best-performing model. In the next section we discuss how to tune decision trees.

4.3.2 Decision tree tuning

As mentioned earlier, pre-pruning a decision tree generally adjusts three parameters: the maximum depth of the tree, the minimum number of samples required to divide nodes, and the minimum number of samples of leaf nodes. In scikit-learn, these three parameters correspond to the following properties: max_depth, min_samples_split, and min_samples_leaf. Generally, the following steps are used to adjust the three parameters:

  1. Search for the best range for max_depth.
  2. Select any of the best ranges for max_depth as depth, and search for the best ranges for min_samples_split based on this.
  3. Similar to step 2, fix max_depth and min_samples_split and search for the best range for min_samples_leaf.
  4. Using grid lookup, traverse the combination of max_depth, min_samples_split, and min_samples_leaf to get the best parameter.

Since the logic of steps 1, 2 and 3 is almost identical, here’s how to search using depth as an example:

Def trainModel(X, y, depth, split=2, leaf=1): clf = tree.DecisionTreeClassifier(max_depth = depth, min_samples_split = split, min_samples_leaf = leaf, Score = cross_val_score(CLF, X, y, CV =10, Score ='f1_weighted') ave_score = sum(scores)/len(scores) return CLF, ave_score = searchDepth def searchDepth(X, y): # Pass through Depth to record the score. Generally, a larger scope can be set first, such as [10, 100]. Depth_options = list(range(3,20)) scores = [] for depth in depth_options: Score = trainModel(X, y, depth) scores. Append (score) drawCurve(depth_options, score, 'depth') 'train&validation f1-scores', 'Depth performance') import matplotlib.pyplot as plt from sklearn import metrics from sklearn.model_selection import cross_val_score X, y = getFeatureAndLabelFromDF(train_and_validate) searchDepth(X, y)Copy the code

Here’s a graph of the results:

As you can see, the model’s performance fluctuates as the depth gets deeper, which means that too deep layers are unreliable. Therefore, we take [5,6] before the fluctuation as the deep search interval. When searching for other parameters, set depth to 5. The following are the search results for the other two parameters in the range [4,5] and [2,3] respectively.

After obtaining the interval, you can use grid search to obtain the final parameters. The internal logic of grid search is to traverse all combinations in the provided parameter value range, calculate the score by setting the evaluation method in turn, and finally provide the parameter combination with the highest score.

from sklearn.model_selection import GridSearchCV params = [{'max_depth': [5, 6], 'min_samples_split': [4, 5], 'min_samples_leaf': [2, 3]}] gridSearchClf = GridSearchCV(tree.DecisionTreeClassifier(),param_grid=params, scoring='f1_weighted', CV =10) GridSearchClF.fit (X, y) GridSearchClF.best_params_ === Output === {'max_depth': 5, 'min_samples_leaf': 2, 'min_samples_split': 5}Copy the code

At this point, we can use the test set to evaluate the model’s generalization performance:

testX, testy = getFeatureAndLabelFromDF(test) pTesty = gridSearchClf.best_estimator_.predict(testX) f1_score(testy, PTesty, average='weighted') === Output == 0.9081741150297716Copy the code

5 Mobile Platform

At the end of the previous chapter, we had a decision tree trained in an offline environment, and now we’ll discuss how to bring this decision tree to the Android side for reasoning. Due to space limitations, there is no step by Step tutorial in the mobile deployment section of this article. If you are interested in this section, you can leave a comment in the comments section and follow up with a separate post detailing the deployment process.

5.1 Android implements machine learning model

First of all, it should be noted that mainstream frameworks on the market, such as TensorFlow and Pytorch, already offer mature end-to-end reasoning solutions for deep learning. And many domestic intelligent frameworks such as Byte Pitaya also include model distribution capabilities. Unfortunately, the machine learning framework discussed in this article, SciKit-Learn, is not as powerful. We need to export the model into something that the Android platform can implement.

There are two lines of thought. The first option is to export the model as program code, compile it and distribute it to android. For example, you can use Sklearn-Porter to export the model into Java/C code and compile it into jar packages or.so files for distribution. Because the exported code is code, operability is high because the user can easily modify the code. Exporting as a. So file can also potentially improve code efficiency. The problem with this approach, however, is that the user needs to develop the feature handling part separately.

The second solution is to convert the model into a description file of intermediate format, such as ONNX or PMML format, and then pull up the description file and parse it into a model, and then reason. This scheme has several advantages. First, ONNX and PMML are open source file formats that can be used by a variety of terminals. Second, this method can export the process related to data processing, and users do not need to develop the code of feature processing separately. However, because the exported description file contains some tag information, the file size may be potentially large. There is also some additional cost to loading the model on the end.

5.2 C/CPP scheme

5.2.1 Exporting code

Exporting the scikit-learn decision tree to C++ code is straightforward. First you need to install the Sklearn-Porter library in Colab. Note that sklearn cannot support the latest version of Sklearn due to the package structure of the new version of Sklearn. You can install version 0.19.2 of Sklearn, or modify the source code of sklearn-Porter.

# Install sklearn-porter (and sklearn, if needed)! pip install --no-cache-dir https://github.com/nok/sklearn-porter/zipball/master ! PIP install scikit - learn = = 0.19.2Copy the code

Exporting the code is very simple.

From sklearn_porter import Porter # def transformModel(CLF): porter = Porter(clf, language='c') output = porter.export(embed_data=True) file_name = "c_model.cpp" with open(file_name,'w') as model_file: model_file.write(output) print("===TRANSFORMATION===") print("model saved in: ", file_name) # export the best decision tree obtained above to transformModel(gridSearchClF.best_estimator_)Copy the code

If you use colab, you can see the c_model.cpp file on the left of the page under file -> Content.

5.2.2 Packaging, Delivery, and On-end Execution

Due to the knowledge of NDK is too complicated, limited to space and the author’s level, here is only a simple description.

  1. Once you have the CPP file from the previous step, you can add it to the Android project NDK code and compile it into the.so file for the platform (armV7 / V8, etc.).
  2. In practice, models often need to be iterated. So you need a.so file management system to make sure that the.so files can be uploaded to the back end for management, and the App can pull the.so files from the back end. The system may also need to handle some. So path management and version management.
  3. Using a scheme similar to Tinker hotfix, App startup inserts the path containing the latest version of the. So file into the BaseDexClassLoader search nativelib list.
  4. Load native library through the path of. So library on android terminal and call relevant code.
  5. The Android receives the callback from the. So library and executes the subsequent business logic.

5.3 PMML scheme

Openscoring. IO provides a series of PMML execution tools, including sciKit-Learn export PMML model, Java platform load PMML model and Java platform execution model full link capability. However, due to the innate lack of support for JXAB (the runtime used to parse PMML) on the Android platform, my attempt to run the PMML model with the shortest path failed. Unfortunately, I encountered a number of problems in my attempts to detour, such as the inability to support the latest version of PMML files on PMML-Android and the inability of the pmML-evaluator version to match PMML-Android. If readers have a strong need, or are interested in exploring further, consider the following process:

  1. The SkLearn2PMML library is introduced into scikit-learn Python code, and the PMML model is exported
  2. Through the JPMML-Android library, PMML files are converted to Java Serialization(SER) format supported by the Android platform.
  3. The JPMML-Evaluator library is introduced in Android, and SER file is loaded and executed.

It’s all about the data

Most of this article, though, is devoted to the principles, training, and deployment of decision trees. But from a larger perspective, the performance of a machine learning model is ultimately determined by training data. The data’s ability to describe tags determines the upper limit of machine learning, and any optimization of the model is only approaching this upper limit.

When we consider adding machine learning-driven businesses to our applications, the first thing we need to think about is what data we have to train on and how well the data can describe the expected reasoning results. In addition, the data collection part is often the most time-consuming and error-prone part of the project, and it is necessary to consider the scheme and time cost of pulling data from both offline and mobile terminals.

Have fun coding.

Reference

  1. Scikit-learn中文官网: scikit-learn.org.cn/
  2. Model selection map provided by Sklearn: scikit-learn.org/stable/tuto…
  3. Jupyter Notebook Home page: jupyter.org/
  4. Android NDK: developer.android.com/ndk
  5. SkLearn2pmml: github.com/jpmml/sklea…
  6. PMML model to SER jPMML-Android: github.com/jpmml/jpmml…
  7. Jpmml-evaluator: github.com/jpmml/jpmml…
  8. Colab Demo: colab.research.google.com/drive/1jLFB…

Hi, I’m Ryver from Kuaishou E-commerce

Kuaishou e-commerce wireless technology team is recruiting talents 🎉🎉🎉! We are the core business line of the company, which is full of talents, opportunities and challenges. With the rapid development of the business, the team is also expanding rapidly. Welcome to join us and create world-class e-commerce products together

Hot job: Android/iOS Senior Developer, Android/iOS expert, Java Architect, Product Manager (e-commerce background), Test development… Plenty of HC waiting for you

Internal recommendation please send resume to >>> our email: [email protected] <<<, note my roster success rate is higher oh ~ 😘