Prometheus Query Language (PromQL) is the Query Language of Prometheus TSDB. It is a key part of data display and alarm rule configuration combined with Grafana.

This article assumes that you know about the four metric types of Prometheus:

  • Counter
  • Gauge (Gauge type)
  • Histogram (histogram type)
  • Summary (Summary type)

For readers’ practice, most of the sample data in this paper target:

  • Prometheus
  • node_exporter

Expression data type


PromQL query statements are expressions that implement four data types:

Instant vector

An Instance vector represents a collection of time sequences, but each sequence has only the nearest point, not a line.

Range vector

A Range vector represents a sequence over a period of time. Each sequence may contain multiple points

Sources: Understanding Prometheus Range Vectors

Scalar

A Scalar is usually a numerical value and can be converted from an Instance vector with only one sequence to a Scalar one.

String

Simple string value that is not currently in use.

The selector


Label selector

Example Query the number of Prometheus HTTP request whose status code is 400.

prometheus_http_requests_total{code="400"}
Copy the code

Label matching operator:

  • =: Matches a string
  • ! =: Does not match the string
  • = ~: Matches the re
  • ! ~: Does not match the re

Query the number of Prometheus HTTP requests whose status code is 4xx or 5xx and whose handler is/API /v1/query

prometheus_http_requests_total{code=~"4.*|5.*",handler="/api/v1/query"}
Copy the code

The inner label __name__ matches the indicator name, and the following expression is equivalent to the previous one

{code=~"4.*|5.*",handler="/api/v1/query",__name__="prometheus_http_requests_total"}
Copy the code
Range selector

Example Query samples collected during Prometheus health check in the past 5 minutes.

prometheus_http_requests_total{code="200",handler="/-/healthy"}[5m]
Copy the code

Unit: MS, S, M, H, D, W, Y

Time series: [1h5m] one hour and five minutes

Time migration


Through the offset

Use offset to go back 5 minutes, that is, query the data generated 5 minutes ago.

prometheus_http_requests_total{code="200"} offset 5m
Copy the code

Range vector queries are also supported

prometheus_http_requests_total{code="200"}[3m] offset 5m
Copy the code
@ modifier

You can also use @ to jump directly to a UINX timestamp by enabling the startup parameter –enable-feature=promql-at-modifier

prometheus_http_requests_total{code="200"} @ 1646089826
Copy the code

The operator


Operators in Prometheus are consistent with those in various programming languages.

Mathematical operator

The following mathematical operators exist in Prometheus:

  • +(addition)
  • -(subtraction)
  • *(multiplication)
  • /(division)
  • %(modulus)
  • ^(power)

The calculation between two scalars

10/3
Copy the code

Transient vector and scalar calculations, which Prometheus kindly removed for us due to the difference in meaning between the calculated values and the original index names.

prometheus_http_response_size_bytes_sum / 1024
Copy the code

The calculation between two instantaneous vectors is as follows: compute node memory usage

(
1 -
node_memory_MemAvailable_bytes{job="node",instance="localhost:9100"} 
/ node_memory_MemTotal_bytes{job="node",instance="localhost:9100"}
)
* 100
Copy the code

If two instantaneous vector labels do not agree, it can passignoringIgnore extra tags

Input example:

method_code:http_errors:rate5m{method="get", code="500"}  24
method_code:http_errors:rate5m{method="post", code="500"} 6
​
method:http_requests:rate5m{method="get"}  600
method:http_requests:rate5m{method="post"} 120
Copy the code

Query example:

method_code:http_errors:rate5m{code="500"} / ignoring(code) method:http_requests:rate5m
Copy the code

Example result:

{method="get"} 0.04 // 24/600 {method="post"} 0.05 // 6/120Copy the code

If the number of two instantaneous vectors is inconsistent, it can passgroup_left,group_rightSpecify which side prevails

Input example:

method_code:http_errors:rate5m{method="get", code="500"}  24
method_code:http_errors:rate5m{method="get", code="404"}  30
method_code:http_errors:rate5m{method="put", code="501"}  3
method_code:http_errors:rate5m{method="post", code="500"} 6
method_code:http_errors:rate5m{method="post", code="404"} 21
​
method:http_requests:rate5m{method="get"}  600
method:http_requests:rate5m{method="del"}  34
method:http_requests:rate5m{method="post"} 120
Copy the code

Query example:

Group_left Indicates the left

method_code:http_errors:rate5m / ignoring(code) group_left method:http_requests:rate5m
Copy the code

Example result:

{method="get", code="500"} 0.04 // 24/600 {method="get", code="404"} 0.05 // 30/600 {method="post", {method="post", code="404"} 0.175 // 21/120Copy the code
Comparison operator

The following comparison operators exist in Prometheus:

  • = =(equal)
  • ! =(not equal)
  • >(greater than)
  • <(less than)
  • > =(Greater than or equal to)
  • < =(less than or equal to)

Comparison between two scalars, with operator followed by bool, resulting in 0(false) or 1 (true)

10 < bool 5
Copy the code

Compare instantaneous vector with scalar to query node status

up{job="node"} ==  bool 1
Copy the code

Compare two instantaneous vectors to see message queue capacity status

prometheus_notifications_queue_length < bool prometheus_notifications_queue_capacity
Copy the code

Logical operator

The following logical operators exist in Prometheus:

  • and(with)
  • or(or)
  • unless(a)

Logical operations apply only to vectors

We have four targets below, and perform corresponding logical operations to achieve similar effects to label selection.

up{instance! = "192.168.1.123:9091"} and the up {job! ="alertmanager"}Copy the code

{the instance of the up = "192.168.1.123:9091"} or up {job = "alertmanager}"Copy the code

up unless up{job="alertmanager"}  
Copy the code

The priority of a binary operator in Prometheus, from high to low.

  1. ^
  2. *./.%.atan2
  3. +.-
  4. = =.! =.< =.<.> =.>
  5. and.unless
  6. or

Operators of the same precedence are left associative

Aggregate operator

Prometheus supports the following built-in aggregation operators that can be used to aggregate individual instantaneous vectors to generate new vectors:

  • sum(total)
  • min(minimum)
  • max(maximum)
  • avg(Average)
  • group(branch)
  • stddev(Standard deviation)
  • stdvar(Standard variance)
  • count(Count the number of elements in the vector)
  • count_values(Count the number of elements with the same value)
  • bottomk(Minimum k elements of sample value)
  • topk(Maximum k elements calculated by sample value)
  • quantile(φ-quantile (0 ≤ φ ≤ 1)

Aggregate operators can be extended by label with without and by

Sum, min, Max, AVG:

Calculates the sum of HTTP requests, the number of urls for maximum and minimum requests, and the average number

sum(prometheus_http_requests_total)
Copy the code

Statistics are collected according to the status code

group:

Use of the uniq class

Stddev, stdvar:

Indicates the degree of dispersion of a set of data, used to measure the degree to which data values deviate from the arithmetic mean. The standard deviation is the square root of the variance, and the smaller the standard deviation, the less these values deviate from the mean, and vice versa. The network fluctuation is reflected by standard deviation

stddev(rate(node_network_transmit_bytes_total[5m]))
Copy the code

Rate Indicates the rate of a specified period of time

The count, count_values:

There are several sequences of statistics

count(prometheus_http_requests_total)
Copy the code

Count the number of values per value

count_values("value",prometheus_http_requests_total)
Copy the code

Bottomk, topk

Computes the five smallest sequences in value

bottomk(5,prometheus_http_requests_total)
Copy the code

Quantile: To find the quantile of data

We now want to find out the distribution of memory usage for all nodes in the K8s cluster

Quantile (0.8, ( 1 - node_memory_MemAvailable_bytes{job="kubernetes-service-endpoints"} / node_memory_MemTotal_bytes{job="kubernetes-service-endpoints"} ) * 100 )Copy the code

80% of nodes have less than 68% memory usage

function


Integer value

ceil()

V instant-vector (CEIL) sample data is rounded up.

Ceil (node_load1) # 1.2 - > 2Copy the code

floor()

Floor (v instant-vector) is the opposite of ceil(), where the sample value of floor() is rounded down.

round()

Round (V instant-vector, TO_nearest =1 scalar) round the sample values. The to_nearest parameter is optional and defaults to 1, indicating that the sample returns the nearest integer multiple of 1. The parameter can be a fraction.

integer

round(prometheus_engine_query_duration_seconds_sum)
Copy the code

Round up to the nearest multiple of 5

round(prometheus_engine_query_duration_seconds_sum,5)
Copy the code
Value capture

clamp()

Clamp (V instant-vector, Min Scalar, Max Scalar) intercepts all elements with sample values within the [min, Max] set, and returns NaN if min> Max

Put back samples between 10 and 20

Clamp (prometheus_http_requests_total, 10, 20)Copy the code

clamp_max()

Clamp_max (V instant-vector, Max Scalar) is the same as CLAMP (), but only limited to the maximum sample size

clamp_min()

Clamp_min (V instant-vector, Min Scalar) is the same as CLAMP (), but only the minimum sample value is limited

Value change statistics

changes()

Changes (v range-vector) returns the number of times the sample value changed over a period of time

changes(node_load1[1m])  
Copy the code
Reset the statistics

resets()

Resets (v range-vector) returns the number of resets in the sample range time. With counter, any decrease in the value between two consecutive samples is treated as a counter reset.

View the number of times the context switch count reset in 5 minutes

resets(node_context_switches_total[5m])
Copy the code
Date and time management

day_of_month()

Day_of_month (v=vector(time()) instant-vector) If the sample value is utc time, return the date (1-31) in the month of that time.

V =vector(time()) is the default argument

day_of_month(node_boot_time_seconds)
Copy the code

day_of_week()

Day_of_week (v=vector(time()) instant-vector) day_of_week(v=vector(time()) instant-vector)

days_in_month()

Days_in_month (v=vector(time()) instant-vector) If the sample value is utc time, return the number of days in the month of the time(28-31).

hour()

Hour (v=vector(time()) instant-vector) If the sample value is utc time, return the hour of the day in which the time is (1-13).

minute()

Minute (v=vector(time()) instant-vector) If the sample value is utc time, return the minute in the hour of the time(1-59).

month()

Month (v=vector(time()) instant-vector) If the sample value is utc time, return the month(1-12) of the time.

year()

Year (v=vector(time()) instant-vector) If the sample value is utc time, returns the year of the time

time()

Returns the number of seconds since UTC, January 1, 1970, not system time, but the time at the time the expression was evaluated.

timestamp()

Timestamp (v instant-vector) returns the timestamp of each sample value, the number of seconds since UTC, January 1, 1970.

Histogram quantile

histogram_quantile()

Histogram_quantile (φ float, b instant-vector) computes the maximum value of a sample of φ (0 ≤ φ ≤ 1) quantile from vector B of type bucket, similar to the aggregation operator quantile.

Calculate the maximum duration of 80% requests.

Histogram_quantile (0.8, rate (prometheus_http_request_duration_seconds_bucket [1] d))Copy the code
Difference and Growth rate

delta()

Delta (v range-vector) computes the difference between the first and last values of each time series element in the range vector. Used with gauge

Calculate the change in memory availability during the day

delta(node_memory_MemAvailable_bytes[1d])
Copy the code

Idelta () IDelta (v range-vector) computes the difference between the last two samples in the range vector. Used with gauge

idelta(node_memory_MemAvailable_bytes[1m])
Copy the code

increase()

Increase (v range-vector) computes increments in a time range, used with counter. It is syntactic sugar for rate rate(v) multiplied by the number of seconds in the time range, primarily for human readability.

Calculate the increase in requests over 10 minutes

increase(prometheus_http_requests_total[10m])
Copy the code

rate()

Rate (v range-vector) Computes the average growth rate per second of the time series in the range vector.

Average rate of requests per second over the past 10 minutes, used with counter.

rate(prometheus_http_requests_total[10m])
Copy the code

irate()

Irate (v range-vector) calculates the instantaneous growth rate per second from the last two points of the time range.

irate(prometheus_http_requests_total[10m])
Copy the code
Label management

label_join()

label_join(v instant-vector, dst_label string, separator string, src_label_1 string, src_label_2 string, …) Adds a label for each time series with the value of the value link specified for the old label

label_join(up{instance="localhost:9100", job="node"},"new_label","-","instance","job")
Copy the code

Results:

up{instance="localhost:9100", job="node", new_label="localhost:9100-node"}   1
Copy the code

label_replace()

label_replace(v instant-vector, dst_label string, replacement string, src_label string, Regex string) gets the value element from the source label to add a new label

$1 gets the re match, and the match value is added to the Hello tag

label_replace(up{instance="localhost:9100", job="node"},"hello","$1","job","(.*)")
Copy the code

Results:

​
up{hello="node", instance="localhost:9100", job="node"}       1
Copy the code
To predict

predict_linear()

Predict_linear (V range-vector, T scalar) predicts sample values after t seconds through simple linear regression, used in conjunction with Gauge.

Based on the remaining space of the file system in the last hour, the remaining space in the next hour is predicted

predict_linear(node_filesystem_free_bytes[1h],3600)
Copy the code
conversion

absent()

Absent (v instant-vector) Returns an empty vector if the vector has elements; If the vector has no elements, the return value is 1.

Set the following alarm expressions:

absent(up{job="node"} == 1)
Copy the code

If up{job=”node”} does not exist or is not 1, the value of the alarm expression is 1

absent_over_time()

Absent_over_time (v range-vector) Returns an empty vector if the range vector has elements; If the range vector has no elements, the return value is 1.

If up{job=”node1″} does not exist in a certain period of time, return 1

absent_over_time(up{job="node1"}[1h])
Copy the code

scalar()

Scalar (v instant-Vector) returns a sample value of this single element as a scalar, and if the input vector is not exactly one element, Scalar will return NaN.

vector()

Vector (S Scalar) returns a scalar as an unlabeled vector.

sgn()

V instant-vector (SGN) returns a vector in which all sample values are converted to 1 or -1 or 0

The definition is as follows:

If v is positive, it’s 1

If v is negative, it’s negative 1

If v is equal to 0, it’s 0.

The sorting

sort()

Sort (v instant-vector) returns vector elements sorted in ascending order by sample value.

sort_desc()

As opposed to sort(), sort in descending order.

_over_time()

The following list of functions allows passing in a range vector and returning a transient vector with an aggregate:

  • avg_over_time(range-vector): The average value of each metric in the interval vector.
  • min_over_time(range-vector): Minimum value of each metric in interval vector.
  • max_over_time(range-vector): The maximum value of each metric in the interval vector.
  • sum_over_time(range-vector): The sum of each metric in the interval vector.
  • count_over_time(range-vector): Number of sample data of each metric in the interval vector.
  • quantile_over_time(scalar, range-vector): quantile (0 ≤ φ ≤ 1) of the sample data value of each metric in the interval vector
  • stddev_over_time(range-vector): The total standard deviation of each metric in the interval vector.
  • stdvar_over_time(range-vector): The total standard variance of each metric in the interval vector
Mathematical function

abs()

V instant-vector (ABS) returns the absolute value of the sample.

sqrt()

V instant-vector (SQRT) calculates the square root of the sample value.

deriv()

Deriv (V range-vector) uses simple linear regression to compute the derivatives per second of time series in range vectors. Used with gauge

exp()

Exp (v instant-vector) computes the exponential function of the sample value.

Special circumstances:

  • Exp(+Inf) = +Inf
  • Exp(NaN) = NaN

Ln (), log2(), log10()

Ln /log2/log10(v instant-vector) computes the logarithm of the sample value

Special case (same for log2/log10) :

  • ln(+Inf) = +Inf
  • ln(0) = -Inf
  • ln(x < 0) = NaN
  • ln(NaN) = NaN

holt_winters()

Holt_winters (V range-vector, SF Scalar, TF Scalar) generates smooth data values of time series data based on the access vector V. The lower the smoothing factor sf is, the more important it is to the old data. The higher the trend factor TF is, the more concerned the trend data is. 0 < sf, tf < = 1. Used with Gauge

Trig functions, radians

  • acos(v instant-vector)
  • acosh(v instant-vector)
  • asin(v instant-vector)
  • asinh(v instant-vector)
  • atan(v instant-vector)
  • atanh(v instant-vector)
  • cos(v instant-vector)
  • cosh(v instant-vector)
  • sin(v instant-vector)
  • sinh(v instant-vector)
  • tan(v instant-vector)
  • tanh(v instant-vector)

Angle and radian transformation

  • deg(v instant-vector)
  • pi()
  • rad(v instant-vector)

Correct me if there is anything wrong. Read via blog: iqsing.github. IO

reference

[1] Understanding Prometheus Range Vectors: satyanash.net/software/20…

[2] promethues: Prometheus. IO/docs/promet…