Bai used to set the log and alarm operation of Loki data source by Grafana. Although alarms can be directly configured on the Grafana panel, they still cannot be centrally maintained and managed. With all that Loki has written about, does it have rules like Promethues for managing policies? The answer is yes!
Following Loki’s RoadMap, Ruler will be launched in Loki 1.7.0. So xiaobai will take you to have a taste today, to experience the correct posture of the log alarm in Loki.
Loki Ruler
Loki1.7 will include a component called Ruler, which was introduced from the Crotex project (remember the Loki cluster architecture?). The main function of Ruler is to continuously query rules rules and push events exceeding the threshold to alert-Manager or other Webhook services.With Cortex, The Ruler component of Loki also has the same architecture. The only major architectural difference between Loki and Cortex is the Configs API. 😂 However, the great thing is that with the consistent hash ring registered with Consul, Loki ruler also supports distributed deployment of multiple instances, with each instance using its own rules according to sharding coordination needs. But it’s a dynamic process,Any addition or deletion of the Ruler instance will result in the re-sharding of Rules
.
Currently, it is relatively easy to enable the ruler component of Loki. As long as the following configuration is introduced and -target=ruler is added in the parameters of Loki startup.
External_url: / / alertManager/alertManager_URL: < alertManager_endpoint > enable_AlertManager_v2: true # Enable_API enable_API: Enable_sharding: True # Ruler consistent hash ring configuration to support multiple instances and sharding rings. <consul-endpoint>:8500 Store: consul # rules Temporary rule file storage path rule_path: / TMP /rules # rules # Support local storage (local) and object file systems (Azure, GCS, S3, SWIFT) / Loki /rules # rules flush_period: 1mCopy the code
For students who want to experience Ruler quickly, you can use Loki-cluster-deploy launched by Docker-compose to deploy the demo
Alert configuration
Loki’s rulers rules and structure are fully compatible with Prometheus, the only difference being the query statement. In Loki we use logQL to query log metrics. A typical rules configuration is described as follows:
Groups: # group name - name: <string> rules: # Alert - Alert: <string> < string > # generate the alarm to the duration of the pending. [for: < duration > | default = 0 s] # customize the alarm event label labels: [< labelname > : Annotations: [< labelName >: < TMPL_string >]Copy the code
For example, if the user wants to penalize alarms if the error rate of a certain service log is greater than 5%, it can be configured as follows:
groups:
- name: should_fire
rules:
- alert: HighPercentageError
expr: |
sum(rate({app="foo", env="production"} |= "error" [5m])) by (job)
/
sum(rate({app="foo", env="production"}[5m])) by (job)
> 0.05
for: 10m
labels:
severity: page
annotations:
summary: High request latency
Copy the code
When an alarm event is generated, we can receive the notification of the event on alert-Manager.
Ruler use
- Before metrics was used to monitor alarms
For some businesses that may not have their own application runtime metrics exposed, it is easier to build metrics alarms by querying logs. For example, the application error rate alarm mentioned above.
- Black box monitoring
For applications that are completely out of our control (open source services or third-party closed source products), it’s a quick way to query log metrics when they don’t provide monitoring metrics.
The following is a big guy using nginx logs and logQL to do monitoring and alarm market, it is not too cool.
- Events that respond to the application
For some special events of the application, we can also use The Ruler of Loki for notification, such as checking the Base Auth authentication leak event in the log
- name: credentials_leak rules: - alert: http-credentials-leaked annotations: message: "{{ $labels.job }} is leaking http basic auth credentials." expr: 'sum by (cluster, job, pod) (count_over_time({namespace="prod"} |~ "http(s?) ://(\\w+):(\\w+)@" [5m]) > 0)' for: 10m labels: severity: criticalCopy the code
- High-cardinality High Cardinality alarm
Those of you who have read Loki Best Practices before Haku know that high-cardinality in Loki severely slows down query efficiency. This part of the query is implemented in the following LogQL V2 syntax
About cloud native xiao Bai
The purpose of cloud native xiaobai is to show the cloud native application from a practical point of view, standing in the perspective of xiaobai to view and use cloud native, and to solve a practical problem in each article starting point to lead you into the cloud native.
Follow the public account “Yunprotoxiao Bai” on wechat and reply to “enter the group” to enter the Loki learning group