In previous posts:

  • Elastic: Build Elasticsearch data with Grok when ingesting to speed up analysis
  • Logstash: How to build a custom Grok pattern step by step

But what if our model doesn’t work?

In this article, we will use Kibana’s Grok Debugger to help us debug broken Grok modes. The divide-and-conquer method described below should help you quickly find out why a given grok pattern doesn’t match your data. Debugging the GROk pattern and making it work will enable you to build data structures, which will ensure that your observability and security use cases will perform at peak performance.

 

Divide (your pattern) and fix (your errors)

Suppose we are trying to parse a relatively long message, such as the following message, which is an entry from Elasticsearch’s slow log:

[the 2020-05-14 T20: girdle, 644] [INFO] [index. Search. Slowlog. Fetch. S – OhddFHTc2h6w8NDzPzIw] [instance – 0000000000] [KIbanA_sample_data_flights][0] took[3.7ms], took_millis[3], total_hits[416 hits], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[1], source[{“query”:{“match”:{“DestCountry”:{“query”:”AU”,”operator”:”OR”,”prefix_length”:0,”max_expansions”:50,”fuzzy_trans Positions: true, “” lenient” : false, “zero_terms_query” : “NONE”, “auto_generate_synonyms_phrase_query” : true, “boost” : 1.0}}}}], id[],

And suppose we find the following grok mode on the Internet, and we’re told we should parse Elasticsearch slow logs, but for some reason it doesn’t work!

\[%{TIMESTAMP_ISO8601:event.end}\]\[%{LOGLEVEL:log.level}\s*\]\[%{DATA:slowlog.type}\]\s*\[%{DATA:host.name}\]\s*\[%{DAT A:slowlog.index}\]\s*\[%{DATA:slowlog.shard:int}]took\[%{DATA:slowlog.took}\],\stook_millis\[%{DATA:slowlog.took_millis: float}\],\stotal_hits\[%{DATA:slowlog.total_hits:int}\shits\]\,\stypes\[%{DATA:slowlog.types}\],\sstats\[%{DATA:slowlog. stats}\],\ssearch_type\[%{DATA:slowlog.search_type}\],\stotal_shards\[%{DATA:slowlog.total_shards:int}\],\ssource\[%{GRE EDYDATA:slowlog.source}\],\sid\[%{DATA:slowlog.x-opaque-id}\]Copy the code

The Internet seems to guarantee that it will not. Fortunately, you can use the Grok Debugger to help you figure out what went wrong. In Kibana, Dev Tools > Grok Debugger, then paste the data and Grok mode, as shown below:

Click the Simulate button above. The structured data response is empty, indicating that the GROK pattern does not match the sample data. Make sure the Grok Debugger runs properly by defining a pattern that we know will match anything, and then storing the result in a field called my_greedy_match. This can be done by defining the GROk schema as:

%{GREEDYDATA:my_greedy_match}
Copy the code

For this pattern, Grok has stored the entire contents of the sample data into a field named my_GREedy_match, which is what we expect from this test.

Next, we started using a divide-and-conquer approach to find the error in the Grok pattern. To do this, we can copy the first half of the split Grok pattern into a new expression and replace the second half with the GREEDYDATA expression we just saw. This new grok mode is shown below:

\[%{TIMESTAMP_ISO8601:event.end}\]\[%{LOGLEVEL:log.level}\s*\]\[%{DATA:slowlog.type}\]\s*\[%{DATA:host.name}\]\s*\[%{DAT A:slowlog.index}\]\s*\[%{DATA:slowlog.shard:int}]took\[%{DATA:slowlog.took}\],\stook_millis\[%{DATA:slowlog.took_millis: float}\],%{GREEDYDATA:my_greedy_match}Copy the code

After pasting the Grok pattern into the Grok Debugger, we see that the structured data response is still empty.

This means the error is in the first half of grok mode. Therefore, we split it into two parts again, as follows:

\[%{TIMESTAMP_ISO8601:event.end}\]\[%{LOGLEVEL:log.level}\s*\]\[%{DATA:slowlog.type}\]\s*\[%{DATA:host.name}\]\s*%{GREED YDATA:my_greedy_match}Copy the code

 

Make a mistake in the middle

We now know that there were no errors in the first quarter of grok mode, and that there were errors before the midpoint of grok mode. So let’s place the GREEDYDATA expression at about three-eighths of the original GROk pattern, as follows:

\[%{TIMESTAMP_ISO8601:event.end}\]\[%{LOGLEVEL:log.level}\s*\]\[%{DATA:slowlog.type}\]\s*\[%{DATA:host.name}\]\s*\[%{DAT A:slowlog.index}\]\s*\[%{DATA:slowlog.shard:int}]%{GREEDYDATA:my_greedy_match}Copy the code

In the debugger it will look like this (this is a match) :

Thus, we know that the error is between three-eighths of the grok model and the midpoint. Let’s try adding some more of the original Grok pattern, like this:

\[%{TIMESTAMP_ISO8601:event.end}\]\[%{LOGLEVEL:log.level}\s*\]\[%{DATA:slowlog.type}\]\s*\[%{DATA:host.name}\]\s*\[%{DAT A:slowlog.index}\]\s*\[%{DATA:slowlog.shard:int}]took%{GREEDYDATA:my_greedy_match}Copy the code

This returns an empty response, as shown by the debugger below:

\[%{TIMESTAMP_ISO8601:event.end}\]\[%{LOGLEVEL:log.level}\s*\]\[%{DATA:slowlog.type}\]\s*\[%{DATA:host.name}\]\s*\[%{DAT A:slowlog.index}\]\s*\[%{DATA:slowlog.shard:int}]took%{GREEDYDATA:my_greedy_match}Copy the code

 

Is coming soon

There was a problem extracting slowlog.shard. Int. If we reexamine the message we are parsing, we will see that the accepted string should be preceded by a space character. Let’s modify the grok mode to see if it works when a space is specified before take, as follows:

\[%{TIMESTAMP_ISO8601:event.end}\]\[%{LOGLEVEL:log.level}\s*\]\[%{DATA:slowlog.type}\]\s*\[%{DATA:host.name}\]\s*\[%{DAT A:slowlog.index}\]\s*\[%{DATA:slowlog.shard:int}]\stook%{GREEDYDATA:my_greedy_match}Copy the code

It works, as follows:

But we still store a bunch of data in my_greedy_match. Let’s readd the rest of the original grok pattern as follows:

\[%{TIMESTAMP_ISO8601:event.end}\]\[%{LOGLEVEL:log.level}\s*\]\[%{DATA:slowlog.type}\]\s*\[%{DATA:host.name}\]\s*\[%{DAT A:slowlog.index}\]\s*\[%{DATA:slowlog.shard:int}]\stook\[%{DATA:slowlog.took}\],\stook_millis\[%{DATA:slowlog.took_milli s:float}\],\stotal_hits\[%{DATA:slowlog.total_hits:int}\shits\]\,\stypes\[%{DATA:slowlog.types}\],\sstats\[%{DATA:slowlo g.stats}\],\ssearch_type\[%{DATA:slowlog.search_type}\],\stotal_shards\[%{DATA:slowlog.total_shards:int}\],\ssource\[%{G REEDYDATA:slowlog.source}\],\sid\[%{DATA:slowlog.x-opaque-id}\]Copy the code

Then paste the Grok mode into the Grok Debugger as follows:

Grok mode running! We now extract structured data from slow log entries that were previously unstructured.

 

Expose available Grok schemas

Using the basic Grok pattern, you can build complex patterns to match your data. In addition, the Elastic Stack comes with more than 120 reusable Grok modes. For a complete list of patterns, see ingesting node Grok pattern and Logstash Grok pattern.

Alternative to grok

In some cases, you can use the Dissect processor to extract structured fields from a single text field. Like the Grok processor, Dissect also extracts structured fields from individual text fields in documents. However, unlike the Grok processor, Dissect does not use regular expressions. This makes Dissect’s syntax simple and possibly faster than Grok Processor.