In Elasticsearch, you can think of machine learning as a natural extension of search and analysis. It is the analysis of time series data. Elasticsearch supports machine learning to automatically analyze time series data by running metric tasks that contain one or more detectors that define the fields to be analyzed. It can help us identify anomalies in univariate time series data and show us normal conditions. In Elasticsearch, we can use machine learning to detect exceptions in time series.

Previously, one of the main problems associated with machine learning with Elasticsearch was solving the problem of exception detection. Basically, anomaly detection is a statistical problem that can be solved in a simple way by flagging irregularity from common statistical attributes of the input data distribution. However, we can use machine learning-based approaches to solve this problem, such as exception detection based on aggregation classes and exception detection based on support vector machines. Elastic Stack offers machine learning capabilities that include Kibana data visualization tools, job management, schedulers, and metrics aggregation from Elasticsearch. We can even use Beats to collect data. For example, use Metricbeat to collect system-level resource usage statistics.

Application of Machine learning in Elasticsearch

What is an exception

Let’s start by looking at the following time series diagram:

First of all, we see the graph in green. It is clear that around 4pm on February 9th, 2016, an airline called AAL booked more tickets than it did on any other day. So this is obviously an exceptional situation. But if the pattern is repeated over and over again at certain times of the week or month, then it should be normal. Another example is that we live in a big city, such as Beijing. In the rush hour on Friday night, there are a lot of vehicles leaving the city, which is significantly higher than other working days of the week. This is obviously an abnormal situation, if we only take the statistical data of one week. However, if this is the case every Friday, the rush seems quite normal. This needs the machine to come to the conclusion according to the previous data analysis.

Another case is:

An entity is abnormal if it is distinct from everything else in the population.

In general, if we say something is abnormal, it is one or two of the following:

  • When there is a sudden and significant change in the behavior of the entity
  • When an entity is completely different from any other entity in the population

In machine learning, it will automatically help us choose the appropriate data model according to the actual data:

Machine learning will automatically analyze the data to select the appropriate model and detect anomalies when ground probability events occur. In most cases, this is the first data model.

 

Machine learning

Machine learning is an unsupervised way to build a data model using data that exists in the Elasticsearch cluster. It can automatically analyze time series data by comparing (new or alternative) data to the model and identifying abnormal patterns or behavior in the data. In Elasticsearch machine learning, machine learning practices the following functions:

  • Anomaly detection
  • Abnormal score (score between 0 and 100, the higher the score, the more abnormal)

Before we can decide whether to use machine learning, we need to verify the suitability of the data we will use. There are three things you should consider before using Elasticsearch machine learning:

  • Whether your data is a time series data
  • The data needs to contain key performance indicators (KPIs) that are critical to the use case
  • Location of data

In fact, you can use machine learning apis to analyze your data. This work is not necessarily done in Kibana. In order to detect data anomalies, it is important to define kpIs for your own data. These KPI indicators can be:

  • The number of logs over a specified period of time
  • The value of a 404 response received within a specified period of time
  • The amount of disk used over a specified period of time

The KPIs that an IT organization chooses to track and tag can span a variety of metrics, including the following:

  • Customer: Impact metrics, such as application response time or error counts
  • Availability: Metric oriented, such as uptime or average Maintenance time (MTTR)
  • Business: Metrics oriented, such as orders per minute, revenue, or active users

Once a KPI is defined, it is analyzed (average, maximum, count, and so on). The combination of analysis functions and KPIs will be called detectors. We can define single metric or multi Metric detectors for the data to check for anomalies in the data.

Another important function of machine learning is to predict the future trends of the analyzed temporal data. Once we’ve built the machine learning homework, we can make predictions. In addition to using the Kibana interface for prediction, Elasticsearch also provides an API. To use this feature, you need to provide the corresponding time period for the prediction.

Machine learning classification:

 

How does machine learning work

When ML jobs are configured to run, ML sorts all these pieces together. The following figure shows a simplified version of this process:

Typically, each bucket span performs the above process once, but other optimizations are made to minimize I/O. These details are beyond the scope of this article. The key point, however, is that this orchestration allows ML to be online (that is, not offline/batch) and constantly learn from newly ingested data. ML also handles this automatically, so users don’t have to worry about the complexity of doing all this.

 

Enable platinum function

Make sure you have the Platinum license installed before continuing with the machine learning feature in Elasticsearch. If you downloaded Elasticsearch software or use the open source version, it will be the base version. We must activate the 30-day trial license to use the Platinum feature.

We can start the Platinum function by following the following steps:

We followed the three steps above to select the trial version and get the Platinum version feature.

 

Machine learning assignments

Kibana 7.0 supports four types of machine learning jobs, as described below:

  1. Single-metric Jobs: Data analysis is performed on only one index field
  2. Multi-metric Jobs: Perform data analysis on multiple index fields; However, each field is analyzed separately
  3. Advanced Jobs: Can perform data analysis on multiple index fields. Provides complete configuration Settings for detectors and influencers
  4. Population Jobs: Data analysis of the distribution behavior of unusual data, such as detecting outliers in the Population

In today’s exercise, we will show an example of using a single-metric job.

 

The Sample data

In machine science today, we use the following method to download our timing data:

git clone https://github.com/liu-xiao-guo/machine_learning_data
Copy the code

After downloading the data, let’s run the following command:

$./cf_rfem_hist_price_bulk_index.sh
Copy the code

It will read the required data into Elasticsearch. We can look this up in Kibana.

GET cf_rfem_hist_price/_count
Copy the code

Display:

{
  "count" : 90,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  }
}
Copy the code

It shows 90 pieces of data. This includes 61 trading days and 29 non-trading days. The structure of each piece of data is:

        "_source" : {
          "date" : "2018-12-26",
          "open" : 55.0803,
          "high" : 56.0999,
          "low" : 54.59,
          "close" : 55.89,
          "volume" : 27038,
          "change" : 0.971,
          "changePercent" : 1.768,
          "label" : "Dec 26, 18",
          "changeOverTime" : 0,
          "symbol" : "RFEM"
        }
Copy the code

This is a trading message under the ticker symbol RFEM. Here it contains daily opening price, high price, low price, close price, trading volume and so on.

 

Run a single-metric job

Basically, a single-metric job uses only one field in an indexed document as an analysis detector. Here are the step-by-step instructions for running a single-metric job against the Volume field.

Create Index pattern

To create an Index pattern, we do the following:

Follow steps 1,2,3 above:

Enter cf_rFEM_HIST_price * and select Next Step.

Then select Date in the Time Filter field name and then Create Index Pattern.

The CF_RFEM_HIST_price * index pattern has been created and can be used in machine learning jobs.

 

Create a new machine learning job

To create a single metric machine learning job, we need to do the following:

  1. Click the machine Learning button on the left toolbar, as shown in the following screen capture, and the machine learning panel appears in the right pane

  1. From the top menu, select Anomaly Detection. Click the Create Job button, as shown in the screen shot below:

 

3. This panel will let us select source data from a new search, selected index or saved search. Click the cf_rFEM_HIST_price index, as shown in the following screen capture:

4. There are several job types that can be used to define machine learning jobs. Let’s select a single Metric job, as shown in the screen shot below:

5. Select Use full CF_rFEM_HIST_price * data

6. Then select the Next button

7. Single-metric jobs must use aggregation. Since our data is a daily record and the memory span (interval) is one day, the aggregation is the same whether we choose “Sum”, “Mean” or “Median” aggregation. We select the Sum aggregation of the volume field to check for an exception. Note that this is also the KPI we talked about earlier. Our aim is to check the total of voume for anything unusual.

One thing we noticed above is that our Bucket span is set to 15m, even though our timing data is one data per day. In actual use, it can be adjusted according to its own use case.

8. Click the Next button so we can see the following screen:

9. Next select Next:

10. Enter the Job ID and select the Next button

Validation information is displayed. In this step, if there is an error, it will display the error message in red letters. It says everything is fine. Then, we select the Next button:

12. Select the Create Job button:

13. We select View Results:

 

Up here, at the beginning, we can see machine learning learning, we can’t see anything special about it. After some learning, it detects an unusual condition.

Congratulations. You’ve created a single-metric machine learning assignment! We can click on the following Anomalies:

Exceptions are shown in different colors in the diagram above:

  • Warning (Blue): The score is less than 25
  • Minor (yellow): Scores between 25 and 50
  • Major (Orange): Score between 50 and 75
  • Critical (RED): scores between 75 and 100

We can click Severity Threshold above to select different levels to display. The following graph shows exceptions that are only Critical (with a score greater than 75) :

Clearly there are a lot of anomalies. Click on the top of January 10th 2019, we can see the details:

The actual value shown above is 304198, but the predicted value based on machine learning should be 22499. Obviously this is an exception. For sales personnel, they can understand according to this situation, why there will be such an abnormal situation, is what causes the sudden increase in trading volume. Is there someone behind this?

In the figure above, because we don’t have a long time, we show the finish in one screen. We can select the time by adjusting the left and right buttons:

So we can focus on the data for the time period we care about.

We can also label our data:

We can label this event by clicking the Create button above:

We can also click on the Forecast button in the upper right:

We fill in the 7-day forecast. Click the Run button:

This is the final prediction.

 

Timing of machine learning

In the exercise above, we chose the default 15 minutes as bucket_span to analyze our data. Think of bucket span as a pre-analysis aggregation interval, a window of time in which a portion of the data is aggregated for analysis purposes. The shorter the bucket_span duration, the more refined the analysis, but the more likely it is to generate noise artifacts in the data. The following figure shows the same data set aggregated over three different time intervals:

Note the prominent anomalies seen in the version that aggregated in the 5 minute interval if the data aggregated in the 60 minute interval almost disappeared due to the short duration of the spike (less than 2 minutes). In fact, during this 60-minute interval, the peak no longer seemed unusual. On the other hand, if the bucket_SPAN value is smaller, the detection time is faster.

 

Read more

In this article, I showed you how to use Elastic machine learning to create a single Metric job. In the following article “Machine Learning practices – Multi Metric Jobs”, we will show how to create a multi-metric job machine learning task.