Prometheus Query Language (PromQL) is the Query Language of Prometheus TSDB. It is a key part of data display and alarm rule configuration combined with Grafana.
This article assumes that you know about the four metric types of Prometheus:
- Counter
- Gauge (Gauge type)
- Histogram (histogram type)
- Summary (Summary type)
For readers’ practice, most of the sample data in this paper target:
- Prometheus
- node_exporter
Expression data type
PromQL query statements are expressions that implement four data types:
Instant vector
An Instance vector represents a collection of time sequences, but each sequence has only the nearest point, not a line.
Range vector
A Range vector represents a sequence over a period of time. Each sequence may contain multiple points
Sources: Understanding Prometheus Range Vectors
Scalar
A Scalar is usually a numerical value and can be converted from an Instance vector with only one sequence to a Scalar one.
String
Simple string value that is not currently in use.
The selector
Label selector
Example Query the number of Prometheus HTTP request whose status code is 400.
prometheus_http_requests_total{code="400"}
Copy the code
Label matching operator:
=
: Matches a string! =
: Does not match the string= ~
: Matches the re! ~
: Does not match the re
Query the number of Prometheus HTTP requests whose status code is 4xx or 5xx and whose handler is/API /v1/query
prometheus_http_requests_total{code=~"4.*|5.*",handler="/api/v1/query"}
Copy the code
The inner label __name__ matches the indicator name, and the following expression is equivalent to the previous one
{code=~"4.*|5.*",handler="/api/v1/query",__name__="prometheus_http_requests_total"}
Copy the code
Range selector
Example Query samples collected during Prometheus health check in the past 5 minutes.
prometheus_http_requests_total{code="200",handler="/-/healthy"}[5m]
Copy the code
Unit: MS, S, M, H, D, W, Y
Time series: [1h5m] one hour and five minutes
Time migration
Through the offset
Use offset to go back 5 minutes, that is, query the data generated 5 minutes ago.
prometheus_http_requests_total{code="200"} offset 5m
Copy the code
Range vector queries are also supported
prometheus_http_requests_total{code="200"}[3m] offset 5m
Copy the code
@ modifier
You can also use @ to jump directly to a UINX timestamp by enabling the startup parameter –enable-feature=promql-at-modifier
prometheus_http_requests_total{code="200"} @ 1646089826
Copy the code
The operator
Operators in Prometheus are consistent with those in various programming languages.
Mathematical operator
The following mathematical operators exist in Prometheus:
+
(addition)-
(subtraction)*
(multiplication)/
(division)%
(modulus)^
(power)
The calculation between two scalars
10/3
Copy the code
Transient vector and scalar calculations, which Prometheus kindly removed for us due to the difference in meaning between the calculated values and the original index names.
prometheus_http_response_size_bytes_sum / 1024
Copy the code
The calculation between two instantaneous vectors is as follows: compute node memory usage
(
1 -
node_memory_MemAvailable_bytes{job="node",instance="localhost:9100"}
/ node_memory_MemTotal_bytes{job="node",instance="localhost:9100"}
)
* 100
Copy the code
If two instantaneous vector labels do not agree, it can passignoring
Ignore extra tags
Input example:
method_code:http_errors:rate5m{method="get", code="500"} 24
method_code:http_errors:rate5m{method="post", code="500"} 6
method:http_requests:rate5m{method="get"} 600
method:http_requests:rate5m{method="post"} 120
Copy the code
Query example:
method_code:http_errors:rate5m{code="500"} / ignoring(code) method:http_requests:rate5m
Copy the code
Example result:
{method="get"} 0.04 // 24/600 {method="post"} 0.05 // 6/120Copy the code
If the number of two instantaneous vectors is inconsistent, it can passgroup_left
,group_right
Specify which side prevails
Input example:
method_code:http_errors:rate5m{method="get", code="500"} 24
method_code:http_errors:rate5m{method="get", code="404"} 30
method_code:http_errors:rate5m{method="put", code="501"} 3
method_code:http_errors:rate5m{method="post", code="500"} 6
method_code:http_errors:rate5m{method="post", code="404"} 21
method:http_requests:rate5m{method="get"} 600
method:http_requests:rate5m{method="del"} 34
method:http_requests:rate5m{method="post"} 120
Copy the code
Query example:
Group_left Indicates the left
method_code:http_errors:rate5m / ignoring(code) group_left method:http_requests:rate5m
Copy the code
Example result:
{method="get", code="500"} 0.04 // 24/600 {method="get", code="404"} 0.05 // 30/600 {method="post", {method="post", code="404"} 0.175 // 21/120Copy the code
Comparison operator
The following comparison operators exist in Prometheus:
= =
(equal)! =
(not equal)>
(greater than)<
(less than)> =
(Greater than or equal to)< =
(less than or equal to)
Comparison between two scalars, with operator followed by bool, resulting in 0(false) or 1 (true)
10 < bool 5
Copy the code
Compare instantaneous vector with scalar to query node status
up{job="node"} == bool 1
Copy the code
Compare two instantaneous vectors to see message queue capacity status
prometheus_notifications_queue_length < bool prometheus_notifications_queue_capacity
Copy the code
Logical operator
The following logical operators exist in Prometheus:
and
(with)or
(or)unless
(a)
Logical operations apply only to vectors
We have four targets below, and perform corresponding logical operations to achieve similar effects to label selection.
up{instance! = "192.168.1.123:9091"} and the up {job! ="alertmanager"}Copy the code
{the instance of the up = "192.168.1.123:9091"} or up {job = "alertmanager}"Copy the code
up unless up{job="alertmanager"}
Copy the code
The priority of a binary operator in Prometheus, from high to low.
^
*
./
.%
.atan2
+
.-
= =
.! =
.< =
.<
.> =
.>
and
.unless
or
Operators of the same precedence are left associative
Aggregate operator
Prometheus supports the following built-in aggregation operators that can be used to aggregate individual instantaneous vectors to generate new vectors:
sum
(total)min
(minimum)max
(maximum)avg
(Average)group
(branch)stddev
(Standard deviation)stdvar
(Standard variance)count
(Count the number of elements in the vector)count_values
(Count the number of elements with the same value)bottomk
(Minimum k elements of sample value)topk
(Maximum k elements calculated by sample value)quantile
(φ-quantile (0 ≤ φ ≤ 1)
Aggregate operators can be extended by label with without and by
Sum, min, Max, AVG:
Calculates the sum of HTTP requests, the number of urls for maximum and minimum requests, and the average number
sum(prometheus_http_requests_total)
Copy the code
Statistics are collected according to the status code
group
:
Use of the uniq class
Stddev, stdvar:
Indicates the degree of dispersion of a set of data, used to measure the degree to which data values deviate from the arithmetic mean. The standard deviation is the square root of the variance, and the smaller the standard deviation, the less these values deviate from the mean, and vice versa. The network fluctuation is reflected by standard deviation
stddev(rate(node_network_transmit_bytes_total[5m]))
Copy the code
Rate Indicates the rate of a specified period of time
The count, count_values:
There are several sequences of statistics
count(prometheus_http_requests_total)
Copy the code
Count the number of values per value
count_values("value",prometheus_http_requests_total)
Copy the code
Bottomk, topk
Computes the five smallest sequences in value
bottomk(5,prometheus_http_requests_total)
Copy the code
Quantile: To find the quantile of data
We now want to find out the distribution of memory usage for all nodes in the K8s cluster
Quantile (0.8, ( 1 - node_memory_MemAvailable_bytes{job="kubernetes-service-endpoints"} / node_memory_MemTotal_bytes{job="kubernetes-service-endpoints"} ) * 100 )Copy the code
80% of nodes have less than 68% memory usage
function
Integer value
ceil()
V instant-vector (CEIL) sample data is rounded up.
Ceil (node_load1) # 1.2 - > 2Copy the code
floor()
Floor (v instant-vector) is the opposite of ceil(), where the sample value of floor() is rounded down.
round()
Round (V instant-vector, TO_nearest =1 scalar) round the sample values. The to_nearest parameter is optional and defaults to 1, indicating that the sample returns the nearest integer multiple of 1. The parameter can be a fraction.
integer
round(prometheus_engine_query_duration_seconds_sum)
Copy the code
Round up to the nearest multiple of 5
round(prometheus_engine_query_duration_seconds_sum,5)
Copy the code
Value capture
clamp()
Clamp (V instant-vector, Min Scalar, Max Scalar) intercepts all elements with sample values within the [min, Max] set, and returns NaN if min> Max
Put back samples between 10 and 20
Clamp (prometheus_http_requests_total, 10, 20)Copy the code
clamp_max()
Clamp_max (V instant-vector, Max Scalar) is the same as CLAMP (), but only limited to the maximum sample size
clamp_min()
Clamp_min (V instant-vector, Min Scalar) is the same as CLAMP (), but only the minimum sample value is limited
Value change statistics
changes()
Changes (v range-vector) returns the number of times the sample value changed over a period of time
changes(node_load1[1m])
Copy the code
Reset the statistics
resets()
Resets (v range-vector) returns the number of resets in the sample range time. With counter, any decrease in the value between two consecutive samples is treated as a counter reset.
View the number of times the context switch count reset in 5 minutes
resets(node_context_switches_total[5m])
Copy the code
Date and time management
day_of_month()
Day_of_month (v=vector(time()) instant-vector) If the sample value is utc time, return the date (1-31) in the month of that time.
V =vector(time()) is the default argument
day_of_month(node_boot_time_seconds)
Copy the code
day_of_week()
Day_of_week (v=vector(time()) instant-vector) day_of_week(v=vector(time()) instant-vector)
days_in_month()
Days_in_month (v=vector(time()) instant-vector) If the sample value is utc time, return the number of days in the month of the time(28-31).
hour()
Hour (v=vector(time()) instant-vector) If the sample value is utc time, return the hour of the day in which the time is (1-13).
minute()
Minute (v=vector(time()) instant-vector) If the sample value is utc time, return the minute in the hour of the time(1-59).
month()
Month (v=vector(time()) instant-vector) If the sample value is utc time, return the month(1-12) of the time.
year()
Year (v=vector(time()) instant-vector) If the sample value is utc time, returns the year of the time
time()
Returns the number of seconds since UTC, January 1, 1970, not system time, but the time at the time the expression was evaluated.
timestamp()
Timestamp (v instant-vector) returns the timestamp of each sample value, the number of seconds since UTC, January 1, 1970.
Histogram quantile
histogram_quantile()
Histogram_quantile (φ float, b instant-vector) computes the maximum value of a sample of φ (0 ≤ φ ≤ 1) quantile from vector B of type bucket, similar to the aggregation operator quantile.
Calculate the maximum duration of 80% requests.
Histogram_quantile (0.8, rate (prometheus_http_request_duration_seconds_bucket [1] d))Copy the code
Difference and Growth rate
delta()
Delta (v range-vector) computes the difference between the first and last values of each time series element in the range vector. Used with gauge
Calculate the change in memory availability during the day
delta(node_memory_MemAvailable_bytes[1d])
Copy the code
Idelta () IDelta (v range-vector) computes the difference between the last two samples in the range vector. Used with gauge
idelta(node_memory_MemAvailable_bytes[1m])
Copy the code
increase()
Increase (v range-vector) computes increments in a time range, used with counter. It is syntactic sugar for rate rate(v) multiplied by the number of seconds in the time range, primarily for human readability.
Calculate the increase in requests over 10 minutes
increase(prometheus_http_requests_total[10m])
Copy the code
rate()
Rate (v range-vector) Computes the average growth rate per second of the time series in the range vector.
Average rate of requests per second over the past 10 minutes, used with counter.
rate(prometheus_http_requests_total[10m])
Copy the code
irate()
Irate (v range-vector) calculates the instantaneous growth rate per second from the last two points of the time range.
irate(prometheus_http_requests_total[10m])
Copy the code
Label management
label_join()
label_join(v instant-vector, dst_label string, separator string, src_label_1 string, src_label_2 string, …) Adds a label for each time series with the value of the value link specified for the old label
label_join(up{instance="localhost:9100", job="node"},"new_label","-","instance","job")
Copy the code
Results:
up{instance="localhost:9100", job="node", new_label="localhost:9100-node"} 1
Copy the code
label_replace()
label_replace(v instant-vector, dst_label string, replacement string, src_label string, Regex string) gets the value element from the source label to add a new label
$1 gets the re match, and the match value is added to the Hello tag
label_replace(up{instance="localhost:9100", job="node"},"hello","$1","job","(.*)")
Copy the code
Results:
up{hello="node", instance="localhost:9100", job="node"} 1
Copy the code
To predict
predict_linear()
Predict_linear (V range-vector, T scalar) predicts sample values after t seconds through simple linear regression, used in conjunction with Gauge.
Based on the remaining space of the file system in the last hour, the remaining space in the next hour is predicted
predict_linear(node_filesystem_free_bytes[1h],3600)
Copy the code
conversion
absent()
Absent (v instant-vector) Returns an empty vector if the vector has elements; If the vector has no elements, the return value is 1.
Set the following alarm expressions:
absent(up{job="node"} == 1)
Copy the code
If up{job=”node”} does not exist or is not 1, the value of the alarm expression is 1
absent_over_time()
Absent_over_time (v range-vector) Returns an empty vector if the range vector has elements; If the range vector has no elements, the return value is 1.
If up{job=”node1″} does not exist in a certain period of time, return 1
absent_over_time(up{job="node1"}[1h])
Copy the code
scalar()
Scalar (v instant-Vector) returns a sample value of this single element as a scalar, and if the input vector is not exactly one element, Scalar will return NaN.
vector()
Vector (S Scalar) returns a scalar as an unlabeled vector.
sgn()
V instant-vector (SGN) returns a vector in which all sample values are converted to 1 or -1 or 0
The definition is as follows:
If v is positive, it’s 1
If v is negative, it’s negative 1
If v is equal to 0, it’s 0.
The sorting
sort()
Sort (v instant-vector) returns vector elements sorted in ascending order by sample value.
sort_desc()
As opposed to sort(), sort in descending order.
_over_time()
The following list of functions allows passing in a range vector and returning a transient vector with an aggregate:
avg_over_time(range-vector)
: The average value of each metric in the interval vector.min_over_time(range-vector)
: Minimum value of each metric in interval vector.max_over_time(range-vector)
: The maximum value of each metric in the interval vector.sum_over_time(range-vector)
: The sum of each metric in the interval vector.count_over_time(range-vector)
: Number of sample data of each metric in the interval vector.quantile_over_time(scalar, range-vector)
: quantile (0 ≤ φ ≤ 1) of the sample data value of each metric in the interval vectorstddev_over_time(range-vector)
: The total standard deviation of each metric in the interval vector.stdvar_over_time(range-vector)
: The total standard variance of each metric in the interval vector
Mathematical function
abs()
V instant-vector (ABS) returns the absolute value of the sample.
sqrt()
V instant-vector (SQRT) calculates the square root of the sample value.
deriv()
Deriv (V range-vector) uses simple linear regression to compute the derivatives per second of time series in range vectors. Used with gauge
exp()
Exp (v instant-vector) computes the exponential function of the sample value.
Special circumstances:
- Exp(+Inf) = +Inf
- Exp(NaN) = NaN
Ln (), log2(), log10()
Ln /log2/log10(v instant-vector) computes the logarithm of the sample value
Special case (same for log2/log10) :
ln(+Inf) = +Inf
ln(0) = -Inf
ln(x < 0) = NaN
ln(NaN) = NaN
holt_winters()
Holt_winters (V range-vector, SF Scalar, TF Scalar) generates smooth data values of time series data based on the access vector V. The lower the smoothing factor sf is, the more important it is to the old data. The higher the trend factor TF is, the more concerned the trend data is. 0 < sf, tf < = 1. Used with Gauge
Trig functions, radians
acos(v instant-vector)
acosh(v instant-vector)
asin(v instant-vector)
asinh(v instant-vector)
atan(v instant-vector)
atanh(v instant-vector)
cos(v instant-vector)
cosh(v instant-vector)
sin(v instant-vector)
sinh(v instant-vector)
tan(v instant-vector)
tanh(v instant-vector)
Angle and radian transformation
deg(v instant-vector)
pi()
rad(v instant-vector)
Correct me if there is anything wrong. Read via blog: iqsing.github. IO
reference
[1] Understanding Prometheus Range Vectors: satyanash.net/software/20…
[2] promethues: Prometheus. IO/docs/promet…