Abstract: Canvas is a template for designing and documenting machine learning systems. It has an advantage over simple text documents because Canvas uses simple widgets to find the key components of a machine learning system through correlations between widgets. This tool has become popular because it visualizes complex projects. In this article, we describe each part of the Canvas by listing practical problems and practical tips that data scientists encounter.


Value proposition

Machine learning should be designed to meet user needs

  • Who are the end users of the predictive system?
  • What do we need them to do?
  • What is the goal of the service? What is the meaning of goals? Only after answering these 3W questions can you begin to think about data collection, feature engineering, modeling, evaluation, and monitoring systems.

Learn from the data

Let’s move on to the Canvas process dedicated to learning from data, which consists of data sources, data collection, feature engineering, modeling, and more.

The data source

This section raises a question about what raw data sources we can use. This step doesn’t require you to plan exactly what data to collect, but it does force you to start thinking about the data sources to use. Some examples of data sources to consider include internal databases, open data, research papers in the domain, apis, web crawling, and the output of other machine learning systems.

To collect data

This part mainly deals with the collection and preparation of data. Machine learning projects would not exist without training datasets. Also, it is desirable that the training set contains a large amount of tagged data. This means that your learning system will need sample inputs and their expected outputs. Machine learning models can only be used to make predictions about new data after learning from data with correct answers. Often, data is not initially provided in annotated form, and it is important to plan for a data set that uses the characteristic process as the actual data for prediction. Only when the input data is correct, the developed learning algorithm has good performance. For example, if you want to build an algorithm to predict whether an Instagram account is fake or real, first you need humans to flag the account as true or false. It’s not a complex task for one person, but depending on the amount of data you need, it can get expensive. But you can get data in a much more cost-effective way. Instagram, for example, allows its users to report images and profiles in their messages as spam. For free, users tag Instagram’s algorithms with data, like favorite posts and report inappropriate content as spam. Instagram then uses that user feedback to combat fraud and spam accounts and provide personalized messages to each customer. It should be noted that the most accurate machine learning systems to date employ “human intervention” methods. This approach takes advantage of both machine and human intelligence. When the machine is not sure whether a prediction it has made is correct, it relies on a human and then adds the human answer to its model. This “human intervention” approach helps to obtain high-quality new data and improve model accuracy over time. There are also projects that can be launched without annotating data sets. These are projects for unsupervised machine learning tasks, such as anomaly detection or audience classification.

Characteristics of the engineering

Once you have annotated data, you need to convert it to a format that the algorithm accepts. In machine learning, this process is called feature engineering. The first set of primitive features can be redundant, massive, and unmanageable. Therefore, data scientists need to select the most important information features to facilitate learning. Feature engineering requires a lot of experimentation and combines automation techniques with intuition and domain expertise.

Eugeny, a data scientist at InDataLabs, said:

We use simple machine learning techniques such as gradient lifting or linear regression to select and interpret features. The coefficients of the regression model automatically provide an estimate of the importance of the feature. We configure the training model multiple times using different hyperparameters to ensure that the ordering of features is reliable and does not change significantly from experiment to experiment.

If you are a domain expert (not a data scientist), you should specify which features are most important from your perspective, which is very useful for future data engineers. If you find yourself listing too many features, try combining them into a feature series. Many machine learning experts believe that correctly selected features are key to effective modeling.

Build and update the model

This section addresses the issue of when to create/update models with new data. There are two main reasons to keep your model up to date. First, the new data can improve the model. Second, it allows you to capture any changes in model execution. How often the model needs to be updated depends on what is forecast. If the model predicts the emotion of the phrase, it doesn’t need to be updated daily or weekly. The structure of the text changes very slowly or not at all. If you get more training data, this is where you need to massively update the model. On the other hand, there are models that work in rapidly changing situations. For example, if you make predictions about customer behavior, you should always check that the model is applicable to new users. Significant changes in audience size and audience structure may require updating the model with new data. Sometimes updates require more time and more processing power. In this case, we need to choose between cost, time, and model quality. The key point of this section is that your model is not built once, it should change over time, like everything else in the world.

To make predictions

Canvas focuses on forecasting and consists of machine learning tasks, decision making, forecasting, offline evaluation and other components.

Machine learning tasks

This section aims to define machine learning tasks in terms of inputs, outputs, and problem types. The most common machine learning tasks are sorting, ranking, and regression. If you predict what something is, the output to predict is the class tag. In binary classification, there are two possible output categories. In a multiclass classification, there are more than two possible classes. The predictive problem of fake Instagram accounts we discussed earlier is an example of binary categorization. Input data may include profile name, profile description, number of posts, number of followers, and the output tag may be “real” or “fake”. If you are trying to predict values, you are dealing with regression tasks. For example, when we try to predict real estate prices over the next few days based on price history and other information about construction and the market, we can think of it as a regression task.

Decision making

How can predictions be used to advise end users on their decisions? Before training data can be collected and models built, you and your team have to explain how to use these predictions to make decisions that provide value to the end user. This is a very important question for every project because it is closely related to the profitability of the project. As mentioned above, a successful machine learning system should create additional value for its users. Machine learning systems must influence the decision-making process in a truly meaningful way, and predictions must be delivered on time. A common mistake many companies make is to build a machine learning model that should be able to make predictions online, and then discover that they don’t have access to real-time data. So pay attention to timing when planning your machine learning projects, and make sure you have the right data at the right time to provide predictions that you can act on. The output of a machine learning system is not always what the user is looking for. For example, an attrition prediction model helps predict who is likely to churn in a month, but what the end user needs is attrition prevention (preventing customer churn in a cost-effective way). The owner of a machine learning project must be able to describe these steps in advance. If you can’t explain how to use predictions to make decisions that provide value to the end user, stop there and don’t move forward until you find the answer.

To make predictions

This section addresses the question: “When do we forecast new inputs?” “And” How long do we need to design new inputs and make predictions?” Some models allow predictions to be updated individually for each user. In this case, you can consider several model update methods:

  • A new prediction is made every time a user opens your application
  • The new predictions are made on request, and users can request updates by clicking on special buttons in the app
  • A predictive update is triggered by an event, such as a user submitting important new information
  • New predictions are scheduled for all users, such as once a week, and there are systems where the predictions for different users are correlated and cannot be updated for one user without updating the entire system. Such generic updates require more time and processing power, and therefore more planning. The predicted time required for updates must correspond to the desired update frequency.

For example, if you are building a movie recommendation system, first consider how often suggestions should be updated with new inputs to make them relevant and valuable to users. Then you should check if this is possible, because the speed of your machine learning system is limited. This is good news for you if you want to update every day and the update takes two hours. If you think your proposal is only worth updating every hour, and updating takes two hours, you need to compromise again between cost, time, and model quality.

Offline evaluation

This module solves the problem of model performance evaluation before going into production. It is important to plan methods and metrics to evaluate the system prior to deployment. Without validation metrics, you won’t be able to choose the model that makes the best predictions and answers, whether the model is good enough, and when it will go into production. So make sure you have metrics that represent what you are trying to achieve.

To evaluate a supervised machine learning algorithm, we usually use k-fold cross validation. The approach implies training several machine learning models on (K-1) subsets of the available training data and evaluating supplementary subsets reserved for evaluation. This process is repeated k times, each time with different validation data. This technique helps avoid overfitting while training with all available data

Eugeny, a data scientist at InDataLabs. Another approach to offline evaluation is to perform offline evaluation of real-time data. For example, if you are building a model to predict real estate prices, just wait for actual sales data to become available and compare your forecast with live data.

Real-time assessment and monitoring

The last part of the Canvas covers online evaluation and monitoring of the model. Here, you specify metrics to monitor post-deployment system performance (track metrics) and measure value creation (business metrics). Adjusting these two metrics will make everyone in the company happier. Ideally, there should be a direct relationship between the quality of the model and business results. The online phase has its own testing procedures. A/B testing is the most common form of online testing. This approach is fairly simple, but it has some tricky rules and principles that you need to follow to properly set up and interpret the results. A promising alternative to A/B testing is an algorithm called A multi-arm game. If you have multiple competing models and your goal is to maximize overall user satisfaction, you can try running a multi-arm game algorithm. As the model goes into production, it interacts with real users, who can also provide information about the model’s accuracy. You can gather this on-the-spot feedback, conduct customer interviews, or analyze comments and support requests. You should also continue to track the performance of the model on real-time data validation metrics and update the model before the quality of the model becomes unsatisfactory to the end user.

This article is translated by Ali Yunqi Community Organization.


The original title of the article “How to Design Better Machine Learning Systems with Machine Learning Canvas”,

The original link