preface
ELK (Elasticsearch, Logstash, Kibana) or EFK (Elasticsearch, Filebeat, Or Fluentd, Kibana) are very important when designing the company’s container cloud log solution. In addition, many of the complex search functions of ES cannot be used at the present stage, so THE Loki log system of Grafana open source is finally selected. Let’s take a look at some of the basic concepts and architectures of Loki, and of course EFK is a mature log aggregation solution in the industry that you should be familiar with and master.
Loki 2.0 Released: Transform Logs as you’re Querying them, and Set Up Alerts within Loki, Loki 2.0 has received an update to better use
Update history
02 October 2020 – First draft
Read the original – wsgzao. Making. IO/post/Loki
Loki profile
Grafana Loki is a set of components that can be composed into a fully featured logging stack.
Unlike other logging systems, Loki is built around the idea of only indexing metadata about your logs: labels (just like Prometheus labels). Log data itself is then compressed and stored in chunks in object stores such as S3 or GCS, or even locally on the filesystem. A small index and highly compressed chunks simplifies the operation and significantly lowers the cost of Loki.
Loki, the latest open source project from the Grafana Labs team, is a horizontally scalable, highly available, multi-tenant log aggregation system. It was designed to be economical, efficient and easy to use because it does not index the log content, but instead creates a set of tags for each log stream, optimized for Prometheus and Kubernetes users. The project was inspired by Prometheus, whose official description is: Like Prometheus,But For Logs.
Project address: github.com/grafana/lok…
Compared to other log aggregation systems, Loki has the following features:
-
Logs are not full-text indexed. By storing compressed unstructured logs and indexing metadata only, Loki is simpler and less costly to operate.
-
This allows logging to scale and operate more efficiently by indexing and grouping the logs using the same tag logging stream as Prometheus.
-
Especially suitable for storing Kubernetes Pod logs; Metadata such as Pod tags is automatically deleted and indexed.
-
Native support by Grafana.
Background and Motivation
When there is a problem with an application or node running in our container cloud, the solution should be as follows:
Our monitoring system is based on Prometheus, in which Metric and Alert are important. Metric indicates that a current or historical value has been achieved, and Alert sets the Metric to a specific base that triggers an alarm. But this information is clearly not enough.
As we all know, the basic unit of Kubernetes is Pod. Pod outputs logs to Stdout and Stderr.
Here’s an example: when one of our pods gets too big and triggers our Alert. At this time, the administrator, go to the page to check and confirm which Pod has the problem, and then to confirm the reason why the Pod memory becomes large, we also need to query the Pod log, if there is no log system, then we need to go to the page or use the command to query:
If, at this point, the application suddenly hangs, we will not be able to find the relevant log at this point. Therefore, you need to introduce a log system to collect logs in a unified manner. With ELK, you had to switch between Kibana and Grafana, affecting the user experience. So, the first purpose of Loki is to minimize the switching costs of metrics and logging, helping to reduce response time to abnormal events and improve the user experience.
Problems with ELK
Many existing log collection schemes use full-text retrieval to index logs (such as ELK scheme), which has the advantages of rich functions and complex operations. However, these schemes are often complex in scale, high in resource consumption and difficult to operate. Many of the features don’t work, and most queries focus on a certain time range and a few simple parameters (host, service, etc.), so using these solutions can be a bit of an overkill.
Therefore, the second purpose of Loki is to achieve a trade-off between the ease and complexity of the query language.
Cost considerations
The scheme of full-text search also brings cost problem. Simply speaking, it is high cost to slice and share the inverted index of full-text search (e.g. ES). Other different designs have since emerged, such as:
- OKlog
Project address: github.com/oklog/oklog
Adopt the ultimate consistent, grid-based distribution strategy. These two design decisions provide a lot of cost savings and very simple operations, but are not easy to query. Therefore, Loki’s third purpose is to provide a more cost-effective solution.
The overall architecture
Loki’s architecture is as follows:
As you can see, Loki’s architecture is very simple and consists of the following three parts:
-
Loki is the master server, storing logs and processing queries.
-
Promtail is the broker that collects logs and sends them to Loki.
-
Grafana is used for UI presentation.
Loki uses the same label as Prometheus for its index. That is, you can use these tags to query both the log content and the monitored data, reducing switching costs between the two queries and greatly reducing the storage of log indexes.
Loki wrote Pormtail using the same service discovery and tag re-tagging library as Prometheus. In Kubernetes Promtail runs on each node as DaemonSet, gets the correct metadata for the logs through the Kubernetes API and sends them to Loki. Here is the log storage architecture:
Read and write
Writing log data relies on Distributor and Ingester, and the overall process is as follows:
Distributor
Once Promtail collects logs and sends them to Loki, Distributor is the first component to receive the logs. Because logs can be written to a large volume, they cannot be written to the database as they come in. This will destroy the database. We need to batch and compress data.
Loki does this by building blocks of compressed data by Gzip the logs as they come in. The Ingester component is a stateful component that builds and flushs Chunck into storage when chunks reach a certain number or time. Each stream log corresponds to an Ingester, and when the logs reach Distributor, the metadata and Hash algorithm calculates which Ingester to go to.
In addition, we copy it n (3 by default) times for redundancy and elasticity.
Ingester
Ingester receives the log and starts building Chunk:
Basically, the logs are compressed and attached to the Chunk. Once the Chunk fills up (a certain amount of data or a certain period of time), Ingester flushes it to the database. We use separate databases for blocks and indexes because they store different data types.
After refreshing a Chunk, Ingester then creates a new empty Chunk and adds new entries to that Chunk.
Querier
Reading is straightforward, with Querier taking care of a time range and a label selector, Querier looks at the index to determine which blocks match, and Greps displays the results. It also gets the latest data from Ingester that hasn’t been refreshed yet.
For each query, a finder will show you all the relevant logs. Query parallelization is implemented and distributed Grep is provided so that even large queries are sufficient.
scalability
Loki index storage can be Cassandra/Bigtable/Dynamodb, while Chuncks can be all kinds of object storage, Querier and Distributor are stateless components.
For Ingester, though, it is stateful. However, when new nodes are added or reduced, chunks between whole nodes are redistributed to accommodate the new hash ring. The Cortex, an implementation of Loki’s underlying storage, has been in production for years.
Install Loki
Installation methods
Instructions for different methods of installing Loki and Promtail.
- Install using Tanka (recommended)
- Install through Helm
- Install through Docker or Docker Compose
- Install and run locally
- Install from source
General process
In order to run Loki, you must:
- Download and install both Loki and Promtail.
- Download config files for both programs.
- Start Loki.
- Update the Promtail config file to get your logs into Loki.
- Start Promtail.
Loki official has written in detail, I take Docker-compose as an example to do a simple demonstration
Install with Docker Compose
version: "3"
networks:
loki:
services:
loki:
image: grafana/loki:latest
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
networks:
- loki
promtail:
image: grafana/promtail:latest
volumes:
- /var/log:/var/log
command: -config.file=/etc/promtail/config.yml
networks:
- loki
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
networks:
- loki
Copy the code
Run the following commands in your command line. They work for Windows or Linux systems.
Wget https://raw.githubusercontent.com/grafana/loki/v2.0.0/production/docker-compose.yaml - O docker - compose. Yaml docker-compose -f docker-compose.yaml up -d [root@localhost loki]# docker-compose ps Name Command State Ports --------------------------------------------------------------------------------- loki_grafana_1 /run.sh Up 0.0.0.0:3000->3000/ TCP loki_loki_1 /usr/bin/loki-config.file... Up 0.0.0.0:3100->3100/ TCP loki_promtail_1 /usr/bin/promtail-config.... UpCopy the code
Loki use
After installation, access grafana on port 3000 of the node above, using (admin:admin) access by default -> Select Add Data Source:
grafana-loki-dashsource
Select Loki from the data source list and configure the source address of Loki:
grafana-loki-dashsource-config
Set the source ADDRESS to http://loki:3100 and save the Settings.
After saving, switch to Explore on the left side of Grafana to enter Loki’s page:
grafana-loki
Then click Log Labels to display the Log labels collected by the current system. Filter the Log according to these labels:
grafana-loki-log-labels
For example, /var/log/messages will filter the messages below the file, but due to the time zone, you may need to set the time to see the data:
grafana-loki-logs
The selector
For the label part of the query expression, wrap it in curly braces {}, and then use key-value pair syntax to select the label. Multiple label expressions are separated by commas, such as:
{app="mysql",name="mysql-backup"}
Copy the code
Currently the following label matching operators are supported:
=
Is equal to the! =
Not equal to the= ~
Regular expression matching! ~
The regular expression was not matched
Such as:
{name=~"mysql.+"} {name! ~"mysql.+"}Copy the code
The same rules that apply to Prometheus tag selectors also apply to Loki log stream selectors.
For Loki’s original design documentation, check out the documentation here: Loki Design Documentation
Loki Cheat Sheet
See your logs
Start by selecting a log stream from the Log labels selector.
Alternatively, you can write a stream selector into the query field:
{job="default/prometheus"}
Here are some example streams from your logs:
{job="varlogs"}
Combine stream selectors
{app="cassandra",namespace="prod"}
Returns all log lines from streams that have both labels.
Filtering for search terms.
{app="cassandra"} |~ "(duration|latency)s*(=|is|of)s*[d.]+"
{app="cassandra"} |= "exact match"
{app="cassandra"} ! = "do not match"
LogQL supports exact and regular expression filters.
Count over time
count_over_time({job="mysql"}[5m])
This query counts all the log lines within the last five minutes for the MySQL job.
Rate
rate(({job="mysql"} |= "error" ! = "timeout")[10s])
This query gets the per-second rate of all non-timeout errors within the last ten seconds for the MySQL job.
Aggregate, count, and group
sum(count_over_time({job="mysql"}[5m])) by (level)
Get the count of logs during the last five minutes, grouping by level.
LogQL
Loki uses a syntax called LogQL for log retrieval similar to PromQL
LogQL: Log Query Language
Loki comes with its own PromQL-inspired language for queries called LogQL. LogQL can be considered a distributed grep
that aggregates log sources. LogQL uses labels and operators for filtering.
There are two types of LogQL queries:
- Log queries return the contents of log lines.
- Metric queries extend log queries and calculate sample values based on the content of logs from a log query.
Inspired by PromQL, Loki also has its own LogQL query statement. Officially, it is like a distributed grep log aggregation viewer. Like PromeQL, LogQL uses tags and operators to filter. It is divided into two parts:
- Log Stream selector
- Filter expression
We can use these two parts to combine the desired functionality in Loki, and usually we can use them to do the following
- View the log content according to the log flow selector
- The relevant metrics are calculated in the log flow by filtering rules
Log Stream Selector
The log stream selector part, like the PromQL syntax, determines the log stream you want to query by collecting the incoming log label. Generally, the matching operation of label supports the following types:
-
=: A complete match
-
! = : don’t match
-
=~: Matches the regular expression
-
! ~: The regular expression does not match
{app=”mysql”,name=~”mysql-backup.+”}
=
: exactly equal.! =
: not equal.= ~
: regex matches.! ~
: regex does not match.
Filter Expression
{instance=~"kafka-[23]",name="kafka"} ! = "kafka.server:type=ReplicaManager"Copy the code
| =
: Log line contains string.! =
: Log line does not contain string.| ~
: Log line matches regular expression.! ~
: Log line does not match regular expression.
Metric Queries
This is actually very similar to what Prometheus imagined.
rate({job="mysql"} |= "error" ! = "timeout" [5m])Copy the code
rate
: calculates the number of entries per secondcount_over_time
: counts the entries for each log stream within the given range.bytes_rate
: calculates the number of bytes per second for each stream.bytes_over_time
: counts the amount of bytes used by each log stream for a given range.
Aggregation operators
Some aggregation operations are also supported, such as
avg(rate(({job="nginx"} |= "GET")[10s])) by (region)
Copy the code
sum
: Calculate sum over labelsmin
: Select minimum over labelsmax
: Select maximum over labelsavg
: Calculate the average over labelsstddev
: Calculate the population standard deviation over labelsstdvar
: Calculate the population standard variance over labelscount
: Count number of elements in the vectorbottomk
: Select smallest k elements by sample valuetopk
: Select largest k elements by sample value
There are many other operations such as’ and, or ‘that are supported
Grafana.com/docs/loki/l…
LogQL in 5 minutes
Refer to the article
Loki Documentation
Loki Posts
Loki: Prometheus-inspired, open source logging for cloud natives
Loki log system details
Grafana log aggregation tool Loki
Grafana Loki brief tutorial
Loki Best Practices
Loki is getting a major 2.0 update with a significant increase in LogQL syntax!