In the previous article “Building Elasticsearch Data with Grok on Ingestion to Speed up analysis”, we looked at how to construct unstructured data (Schema on write) at ingestion to ensure that your analysis runs almost in real time. This speed can help take your observable use cases to the next level.
In this article, we will learn from creating a new Grok pattern step by step from scratch! This means that regardless of your use case or requirements, you can construct data to speed things up.
Debug grok mode
Two tools that can be used to build and debug grok patterns are the Simulate Pipeline API, which we used in the previous part of this blog series, and Kibana’s Grok Debugger. The incremental constructor shown here will use one of these two tools. In this article, we will use the Grok Debugger.
Structured data
For this blog, let’s say we’re told to write a Grok schema to parse the following message:
55.3.244.1 GET /index.html 15824 0.043 other stuff"Copy the code
We also assume that we are told to structure the above data into a field name that conforms to the Elastic Common Schema (ECS) and have the following information about the above message:
- The first token is the host IP address
- The second token is the HTTP request method
- The third token is the URI
- The fourth token is the size of the request in bytes
- The fifth token is the duration of the event
- The rest of the text is just other text that we don’t care about
Based on these instructions, we want to extract the following ECS-compliant fields for the above message:
"Host. IP ": "55.3.244.1" "http.request.method": "GET" "url.original": "/index.html" "http.request.bytes": 15824 "the event. The duration" : 0.043Copy the code
Build a new grok expression step by step
Now, we will build up a grok expression from left to right. Let’s start by seeing if IP addresses can be extracted from the information. We’ll use the IP Grok pattern to match the host. IP field and the GREEDYDATA pattern to capture everything after the IP address. As follows:
%{IP:host.ip}%{GREEDYDATA:my_greedy_match}
Copy the code
Let’s go to the Dev Tools in Kibana and use the Grok Debugger to see if the Grok mode can parse messages:
It works as expected. The host. IP field is correctly extracted and the rest of the message is stored in my_greedy_match. Success!
Let’s add the next part of the Grok pattern. We know that this is the http.request. Method field, which is in WORD Grok mode. Therefore, we added the following grok mode:
%{IP:host.ip}%{WORD:http.request.method}%{GREEDYDATA:my_greedy_match}
Copy the code
However, as shown below, testing it in the Kibana debugger yields an empty response. This is not what we expected!
The null response is due to a pattern mismatch. This is because the message has a space between host. IP (55.3.244.1 in this example) and Request.method (GET in this example), but no space in grok mode. Let’s fix this error and then try using the following GROk pattern.
Effective! We now extract the host. IP and HTTP.request. method fields.
Put everything on top together
We still need to analyze the remaining fields. We can continue to add grok patterns gradually until we end up with the following grok patterns:
%{IP:host.ip} %{WORD:http.request.method} %{URIPATHPARAM:url.original} %{NUMBER:http.request.bytes:int} %{NUMBER:event.duration:double} %{GREEDYDATA:my_greedy_match}
Copy the code
We can test this in Kibana as follows:
It works as expected! For this example, however, we are not interested in keeping the my_GREedy_match field, so we can remove it from the grok expression as follows:
%{IP:host.ip} %{WORD:http.request.method} %{URIPATHPARAM:url.original} %{NUMBER:http.request.bytes:int} %{NUMBER:event.duration:double} %{GREEDYDATA}
Copy the code
This looks exactly what we wanted! We now have a Grok schema that can be used to construct the data contained in the message fields.
You can find out more about grok usage in my previous Meetup videos:
Elastic Logstash hands-on practice