1. What’s in your hammer?
It has always been our team’s concern to explore the greater value of the log. In addition to real-time log query, SLS has improved the following features in DevOps this year:
- Context query
- Real-time Tail and intelligent clustering to improve problem investigation efficiency
- Provides a variety of anomaly detection and prediction functions for time series data for more intelligent checking and prediction
- Visualization of the results of data analysis
- Powerful alarm Settings and notifications, by calling Webhook for associated action
Today, we will focus on how to cooperate with log clustering and abnormal alarms to better discover and alarm exceptions
2. Platform experiment
2.1 Experimental Data
A copy of Sys Log’s original data, and Log clustering service is enabled. The specific status is shown in the screenshot below:
By adjusting the size of red box 1 in the screenshot below, the result of red box 2 in the figure can be changed, but it does not change for each of the most fine-grained patterns, that is, the result of the subpattern is stable and unique, and we can find the corresponding original log entry through the Signature of the subpattern.
2.2 Generate timing information of submodes
Suppose we want to monitor this subpattern:
MSG: vm-11193.tc su: pam_UNIX (*:session): session closed for user root signature_id: log_signature: 1814836459146662485
We have obtained the original log corresponding to the above pattern, and we can see the backward graph of the specific quantity on the timeline:
In the figure above, we can find that the distribution of logs in this mode is not very balanced, and some of them are not. If the number of logs is directly counted according to the time window, the sequence diagram is as follows:
__log_signature__: 1814836459146662485 |
select
date_trunc('minute', __time__) as time,
COUNT(*) as num
from log GROUP BY time order by time ASC limit 10000
Copy the code
In the diagram above we find that time is not continuous. Therefore, we need to complement this sequence.
__log_signature__: 1814836459146662485 |
select
time_series(time, '1m'.'%Y-%m-%d %H:%i:%s'.'0') as time,
avg(num) as num
from (
select
__time__ - __time__ % 60 as time,
COUNT(*) as num
from log GROUP BY time order by time desc )
GROUP by time order by time ASC limit 10000
Copy the code
2.3 Abnormal detection of the timing sequence
Use the timing exception detection function: ts_predicate_arma
__log_signature__: 1814836459146662485 |
select
ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, 'avg')
from (
select
time_series(time, '1m'.'%Y-%m-%d %H:%i:%s'.'0') as time,
avg(num) as num
from (
select
__time__ - __time__ % 60 as time,
COUNT(*) as num
from log GROUP BY time order by time desc )
GROUP by time order by time ASC ) limit 10000
Copy the code
2.4 How do I Set Alarms
- Unpack the results of machine learning functions
__log_signature__: 1814836459146662485 |
select
t1[1] as unixtime, t1[2] as src, t1[3] as pred, t1[4] as up, t1[5] as lower, t1[6] as prob
from (
select
ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, 'avg') as res
from (
select
time_series(time, '1m'.'%Y-%m-%d %H:%i:%s'.'0') as time,
avg(num) as num
from (
select
__time__ - __time__ % 60 as time,
COUNT(*) as num
from log GROUP BY time order by time desc )
GROUP by time order by time ASC )) , unnest(res) as t(t1)
Copy the code
- This alarm is generated for the last two minutes
__log_signature__: 1814836459146662485 |
select
unixtime, src, pred, up, lower, prob
from (
select
t1[1] as unixtime, t1[2] as src, t1[3] as pred, t1[4] as up, t1[5] as lower, t1[6] as prob
from (
select
ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, 'avg') as res
from (
select
time_series(time, '1m'.'%Y-%m-%d %H:%i:%s'.'0') as time,
avg(num) as num
from (
select
__time__ - __time__ % 60 as time, COUNT(*) as num
from log GROUP BY time order by time desc )
GROUP by time order by time ASC )) , unnest(res) as t(t1) )
where is_nan(src) = false order by unixtime desc limit 2
Copy the code
- Alarm the rising point and set a bottom-pocket policy
__log_signature__: 1814836459146662485 |
select
sum(prob) as sumProb, max(src) as srcMax, max(up) as upMax
from (
select
unixtime, src, pred, up, lower, prob
from (
select
t1[1] as unixtime, t1[2] as src, t1[3] as pred, t1[4] as up, t1[5] as lower, t1[6] as prob
from (
select
ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, 'avg') as res
from (
select
time_series(time, '1m'.'%Y-%m-%d %H:%i:%s'.'0') as time, avg(num) as num
from (
select
__time__ - __time__ % 60 as time, COUNT(*) as num
from log GROUP BY time order by time desc )
GROUP by time order by time ASC )) , unnest(res) as t(t1) )
where is_nan(src) = false order by unixtime desc limit2)Copy the code
Specific alarm Settings are as follows:
3. Hard wide time
3.1 Log Progression
This is a Demo of the log service
For details about log learning, see Log Service Learning Path.
The original link
This article is the original content of the cloud habitat community, shall not be reproduced without permission.