InfluxDB is an open source temporal database developed by InfluxData. Written by Go, it focuses on high-performance query and storage of temporal data. InfluxDB is widely used for monitoring data of storage systems.

It also does not rely on an open source timing database to record metrics and events for data analysis. What is a time series database?

What is a time series database, the simplest definition is that the data format contains Timestamp field data, such as the environment temperature at a certain time, CPU utilization rate, etc. But what data doesn’t contain Timestamp? Almost all data can actually be typed with a Timestamp field. A more important attribute of time series data is how to query it, including data filtering, calculation and so on. General time series data have the following two characteristics:

Simple data structure A large amount of data the so-called simple structure can be understood as a metric has only one value at a certain point in time, without complex structures (nesting, hierarchy, etc.) and relationships (association, primary and foreign keys, etc.).

The large amount of data is another important feature, because time series data is generated, collected and sent by a large number of monitored data sources, such as hosts, Internet of Things devices, terminals or APPS.

Introduction to InfluxDB

1. Based on time series, support correlation functions related to time (such as maximum, minimum, and sum).

2. Measurability: You can compute large amounts of data in real time

3. Event-based: It supports arbitrary event data

1. No structure (no mode) : The value can be any number of columns

2. Scalable

3. Support min, Max, sum, count, mean, median and a series of functions, convenient statistics

4. Native HTTP support with built-in HTTP API

5. Powerful SQL like syntax

6. Built-in management interface, easy to use

Website document: docs.influxdata.com/influxdb/v1…

Installation of InfluxDB

This section uses RedHat and CentOS as an example to describe how to install the InfluxDB software. RedHat and CentOS users can use YUM package management to install the InfluxDB of the latest version.

cat <<EOF | sudo tee /etc/yum.repos.d/influxdb.repo
[influxdb]
name = InfluxDB Repository - RHEL \$releasever
baseurl = https://repos.influxdata.com/rhel/\$releasever/ \$basearch/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdb.key
EOF
Copy the code

Once added to the yum source, you can run the following commands to install and start the InfluxDB service:

sudo yum install influxdb

The latest version is 1.7.3-1

InfluxDB is started

1. Start the server. If the server is installed using a package, run the following command to start the server:

Sudo service influxdb start The server is started

2. The client can log into the Influx server with USr/bin. You can also add paths to environment variables so that you can use INFLUX anywhere.

Command to start the client, as shown below:

Earlier versions of InfluxDB provide a web management page. To access the web management page, enter http://server IP address :8083 in the address box of the browser.

Starting with version 1.3, however, the Web administration interface is no longer available in InfluxDB. The official version uses Chronograf to query data, write data, and manage the database instead of the Web management interface.

We’ll talk more about Chronograf later

Operations related to InfluxDB

1. Enable authentication

Creating an Administrator

Modify the configuration file to enable authentication

Restart InfluxDB:

systemctl restart influxdb

Log in again

Client tool to connect to database:

Note: the -precision parameter here specifies that the timestamp format is rfc3339, or it may not be used.

It is also helpful for the influxDB

Database: indicates the database.

Measurement: Tables in the database. It’s the container for tag, field, time; Fields are required for the measurement of the influxDB and cannot be sorted by field;

Tags are optional and can be used as indexes. Tags are stored as strings.

Points: a row of data in a table.

Concepts unique to influxDB

(1) Point consists of time stamp, data field and tags.

Time: Indicates the time of each data record, which is the main index automatically generated by the database.

Fields: Values of various records;

Tags: Various indexed attributes.

InfluxDB does not require schema definition, which means you can add measurements, tags, and fields at any time at will.

(2) series

All the data in the database need to be displayed through a chart, and this series represents the data in the table, which can be drawn into several lines on the chart: it can be calculated through the combination of tags.

In fact, a series is a measurement point, or a curve, so retention policy, measurement and tagset together constitute a unique identifier of a positioning measurement point sequence.

Point is the value of multiple fields at the same time of a series, which forms a point. It’s just a point on a curve.

InfluxDB compares nouns in a traditional database

(3) Retention Policy

Retention policies, used to determine how long to keep data, how many backups to keep, and cluster policies.

3. Perform related operations

View database:

Create database:

Insert data:

InfluxDB data is stored in the Line Protocol format.

Line Protocol format: Fixed format of points written to the database. The format is as follows:

<measurement>[,<tag-key>=<tag-value>...]  <field-key>=<field-value>[,<field2-key>=<field2-value>...]  [unix-nano-timestamp]Copy the code

Such as:

> INSERT CPU, host = serverA, region = us_west value = 0.64Copy the code

Among them:

The CPU is the name of the table

The host = serverA, region = us_west is the tag

Value = 0.64 is a field

Query data:

InfluxDB supports SQL-like statements, and the specific query syntax is similar.

For example, select * from CPU

Note: The InfluxDB cluster feature is no longer open source. To use the cluster service, you need to purchase the Enterprise edition.

The main differences between the Enterprise edition and the Open source edition are that the Enterprise edition supports clusters, which the open source edition does not, and that the Enterprise edition provides advanced backup/restore features, which the open source edition does not.

However, InfluxDB’s standalone capabilities are strong enough to support small and medium-sized businesses.

4. Data storage policy

InfluxDB processes tens of thousands of pieces of data per second. Saving all of this data takes up a lot of storage space, and sometimes we may not need to store all of our historical data.

For this reason, InfluxDB has introduced data Retention Policies that allow us to customize how long our data is retained. Data retention policies for InfluxDB are used to define how long data is stored in the InfluxDB, or to define how much data is stored for a period.

A database can have multiple retention policies, but each policy must be unique.

Query strategy

You can view the existing policies of the database by:

Telegraf database has only one policy, and each field has the following meanings:

Name: name. In this example, the name is autogen. When you create a database, InfluxDB automatically creates a policy for the database called Autogen, which stores data permanently. You can rename the policy and disable automatic policy creation in the InfluxDB configuration file.

Duration: data retention time (0 represents unrestricted shardGroupDuration) : storage time of shardGroup, a basic storage structure of the InfluxDB. ReplicaN: REPLICATION, number of copies default: indicates whether the policy is the default policy. There are two concepts:

Shard:

Shard is an important concept in InfluxDB, which is associated with retention policy. There are many shards for each storage policy, and each shard stores data for a specified period of time.

It is not repeated. For example, data from 7pm to 8pm falls into SHARd0, and data from 8pm to 9pm falls into SHARd1. Each SHard corresponds to an underlying TSM storage engine with an independent cache, WAL, and TSM file.

When a database is created, a default storage policy is automatically created to store data permanently. The shard in this storage policy stores data for a period of 7 days, that is, 168 hours as shown in the preceding query.

If a new Retention Policy is created to set the data retention time to 1 day, the data stored in a single SHard is stored at an interval of 1 hour. Data older than 1 hour is stored in the next SHard.

Shard group:

A Shard group is a logical container for shards.

Create a strategy

Grammar:

CREATE RETENTION POLICY <retention_policy_name> ON <database_name> DURATION <duration> REPLICATION <n> [SHARD DURATION <duration>] [DEFAULT]
Copy the code

The: SHARD DURATION clause determines the DURATION of storage for each SHARD group. This clause is not valid in a permanent storage policy. This clause is optional. Shard Group Duration By default, the duration of the policy is determined.

Example 1: Create a policy for the database Telegraf

CREATE RETENTION POLICY "one_day_only" ON "telegraf" DURATION 1d REPLICATION 1
Copy the code

Example 2: Create a default policy for the database Telegraf

CREATE RETENTION POLICY "one_day_only" ON "telegraf" DURATION 24h REPLICATION 1 DEFAULT
Copy the code

Modify the strategy

Grammar:

ALTER RETENTION POLICY <retention_policy_name> ON <database_name> DURATION <duration> REPLICATION <n> SHARD DURATION <duration> DEFAULT
Copy the code

Deletion policy

Grammar:

DROP RETENTION POLICY <retention_policy_name> ON <database_name>
Copy the code

Note: If the policy used by a table is not the default policy, you must specify the policy name explicitly during the operation. Otherwise, errors may occur.

InfluxDB Command set

SHOW MEASUREMENTS — Queries tables contained in the current database

SHOW FIELD KEYS — View the fields of all tables in the current database

SHOW series from pay — View key data

SHOW TAG KEYS FROM “pay” — Check the key TAG value

SHOW TAG VALUES FROM “pay” WITH KEY = “merId” — Check the value of the KEY TAG

SHOW TAG VALUES FROM cpu WITH KEY IN (“region”, “host”) WHERE service = ‘redis’

DROP SERIES FROM <measurement_name[,measurement_name]> WHERE <measurement_name >=’ tag_key> ‘– delete a key

SHOW CONTINUOUS QUERIES — View the CONTINUOUS execution of commands

SHOW QUERIES — Check the last command to be executed

KILL QUERY — Terminates the command

SHOW RETENTION POLICIES ON myDB — Check retained data

Query data

SELECT * FROM /.*/ LIMIT 1 — Query the first row of all tables in the current database

select * from pay order by time desc limit 2

Select * from db_name.”POLICIES name”. Measurement_name — Specifies whether to query table data in a database whose data is reserved

Delete the data

Delete from “query” — Delete all data from a table and the table does not exist

Drop MEASUREMENT “query” — delete a table

DELETE FROM cpu

DELETE FROM cpu WHERE time < ‘2000-01-01T00:00:00Z’

DELETE WHERE time < ‘2000-01-01T00:00:00Z’

DROP DATABASE “testDB” — DROP the DATABASE

DROP RETENTION POLICY “dBBAK” ON myDB — Deletes retained data to DBBAK

DROP SERIES from pay where tag_key=” — DROP SERIES from pay where tag_key=’

SHOW SHARDS — View data store files

DROP SHARD 1

SHOW SHARD GROUPS

SHOW SUBSCRIPTIONS

5.Chronograf

Influxdb has disabled the built-in 8086 web management functionality since version 1.3 and requires a separate tool to manage it. And that tool was Chronograf.

Chronograf is a component of the TICK technology stack. TICK is a monitoring suite developed by InfluxdDB. It works for the upstream and downstream of time series database, including index collection, analysis and graph drawing. It mimics the ELK suite of log analysis system. TICK include:

Telegraf: Data collection

InfluxDB: receives and stores data

Chronograf: data summary display, alarm, etc

Kapacitor: Data processing, such as monitoring policies, etc

Download the Chronograf

Download address: portal.influxdata.com/downloads

Install the Chronograf

Start the Chronograf

systemctl start chronograf

Visit the Chronograf

Browser type http://IP:8888

Query:

Rich interface operation, easy to query, view, statistical analysis, etc., very convenient to use.

V. Data backup and recovery

Format of the command for backing up data

influxd backup -database [name] [path-to-backup]
Copy the code

Remote backup:

Influxd backup - database myinfo - host 0.0.0.0:8088 / home/gooagoo/influxDB/backupCopy the code

Replace 0.0.0.0 with an IP address

Format of the command to restore data

 influxd restore [ -metadir | -datadir ] <path-to-meta-or-data-directory> <path-to-backup>
Copy the code

6. Comprehensive case: Telegraf + InfluxDB + Grafana

1.Telegraf

Telegraf is an agent written in Go that can be used to collect statistics on systems and services and is part of the TICK technology stack. It has input plug-ins that can fetch metrics directly from the system, from third-party apis, and even from Kafka. It also has output plug-ins that can send collected metrics to various data stores, services, and message queues. InfluxDB, Graphite, OpenTSDB, Datadog, Librato, Kafka, MQTT, etc.

Download and install Telegraf

Wget HTTP: / / https://dl.influxdata.com/telegraf/releases/telegraf-1.9.3-1.x86_64.rpmCopy the code
Sudo yum install telegraf-1.9.3-1.x86_64. RPM telegraf-versionCopy the code

If your Telegraf is installed, its configuration file location is:

/etc/telegraf/telegraf.conf

Edit the configuration file to specify our configured Influxdb database as the desired output source:

[[outputs.influxdb]]

urls=[“http://localhost:8086”]

Start service, add boot startup:

sudo systemctl start telegraf.service

sudo service telegraf status

sudo systemctl enable telegraf.service
Copy the code

Check what data Telegraf collects under the default configuration on InfluxDB:

> show databases
> use telegraf
> show measurements
> SHOW FIELD KEYS
Copy the code

How to configure:

# Read metrics about cpu usage
Read metrics about CPU usage
[[inputs.cpu]]
## Whether to report per-cpu stats or not
percpu = true
## Whether to report total system cpu stats or not
totalcpu = true
## If true, collect raw CPU time metrics.
collect_cpu_time = false
## If true, compute and report the sum of all non-idle CPU states.
report_active = false

# Read metrics about disk usage by mount point
Read disk usage metrics via mount Point
[[inputs.disk]]
## Ignore mount points by filesystem type.
ignore_fs = ["tmpfs"."devtmpfs"."devfs"."overlay"."aufs"."squashfs"]

# Read metrics about disk IO by device
Read disk IO metrics from device
[[inputs.diskio]]

# Get kernel statistics from /proc/stat
Get kernel statistics from /proc/stat
[[inputs.kernel]]
# no configuration

# Read metrics about memory usage
Read metrics about memory usage
[[inputs.mem]]
# no configuration

# Get the number of processes and group them by status
Get the number of processes and group them by state
[[inputs.processes]]
# no configuration

# Read metrics about swap memory usage
Read metrics about swap memory usage
[[inputs.swap]]
# no configuration

# Read metrics about system load & uptime
Read metrics about system load and uptime
[[inputs.system]]
# no configuration
Copy the code

How to find indicators and collect data

Telegraf is divided into input plugins and inputs. The source directory of telegraf corresponds to plugins/inputs and outputs respectively. Just refer to telegraf official file to find the plugins you need and then go to the corresponding directory to find the corresponding.md file. Obtain related information as prompted and perform the configuration.

After enabling the Telegraf service, you will find an additional Telegraf library with multiple measurements in InfluxDB, indicating that our data collection has been successful. Once we have the data, we need to worry about how to aggregate the data and present it.

2. Grafana display

To download and install the basic configuration, refer to wiki: Grafana

Here’s a quick look at how to use Grafana to present the data collected by Telegraf.

After starting the service, access http://ip:3000. The default port number is 3000, which can be changed in the configuration file. After login, configure the data source as prompted, and select InfluxDB as the data source:

Input the configuration items of the InfluxDB based on the configuration information

Then create a dashboard:

Let’s first import the template to preview the effect and then learn about the configuration of Grafana/Dashboard. Here we choose the official Telegraf: System Dashboard address. Follow its prompts to configure your Telegraf. Then select import-> upload. jsonFile in dashboards and import the downloaded template:

View the results:

The features of Grafana are too rich to be described in detail here. Please refer to the official documentation for more information: docs.grafana.org/

7. InfluxDB Hardware scale Guide

Single node or cluster? InfluxDB single-node instances are fully open source. InfluxDB clusters require our closed source commercial products. Single-node instances do not provide redundancy. If the server is unavailable, writes and queries fail immediately.

Clustering provides high availability and redundancy. Multiple copies of data are distributed across multiple servers, and the loss of any one server will not have a significant impact on the cluster. If your performance requirements are in the medium or low load range,

Then you might use a single node instance of InfluxDB. If at least one of your performance requirements falls into a category that may not be feasible, you may need to use clustering to distribute the load across multiple servers.

1. Single node

Note: The impact of queries on the system varies widely.

Simple query:

There are few functions and no regular expressions

Time is limited to a few minutes, hours or a day

It is usually executed in milliseconds to tens of milliseconds

Medium query:

There are multiple functions and one or two regular expressions

There may also be complex GROUP BY clauses or sampling over a time frame of several weeks

Usually executed within a few hundred or thousands of milliseconds

Complex query:

Have multiple aggregation or conversion functions or multiple regular expressions

Very large time ranges can be sampled over months or years

It usually takes more than seconds to execute

Low load suggestions:

CPU: 2-4 cores

RAM: 2-4 GB

IOPS: 500

Moderate load recommendations:

CPU: 4 to 6 nuclear

RAM: 8 to 32 GB

IOPS: 500-1000

High load suggestion:

CPU: 8+ core

RAM: + 32 GB

IOPS: 1000 +

2, the cluster

Yuan node

A cluster must have at least three independent meta-nodes to survive the loss of a server. A cluster with 2n + 1 meta-nodes can tolerate the loss of meta-nodes n. The cluster should have an odd number of meta-nodes. There is no reason to have an even number of meta-nodes, and it can cause some configuration problems.

Meta-nodes do not require a lot of computing power. Regardless of cluster load, we recommend using the following for meta-nodes:

Is generally recommended

CPU: 1-2 cores

RAM: 512 MB-1 GB

IOPS: 50

Data nodes

A cluster with only one data node is valid, but there is no data redundancy. Redundancy is set by replication factors on the retention policy for writing data. The cluster may lose the n-1 data node and still return the full query result, where n is the replicator.

For optimal data distribution within the cluster, InfluxData recommends using an even number of data nodes. The hardware recommendation for clustered data nodes is similar to the standalone instance recommendation. Data nodes should always have at least two CPU cores because they must handle regular read and write traffic as well as in-cluster read and write traffic.

Because of the cluster communication overhead, the throughput of data nodes in a cluster is lower than that of standalone instances on the same hardware.

Note: The impact of queries on the system varies widely.

Simple query:

There are few functions and no regular expressions

Time is limited to a few minutes, hours or a day

It is usually executed in milliseconds to tens of milliseconds

Medium query:

There are multiple functions and one or two regular expressions

There may also be complex GROUP BY clauses or sampling over a time frame of several weeks

Usually executed within a few hundred or thousands of milliseconds

Complex query:

Have multiple aggregation or conversion functions or multiple regular expressions

Very large time ranges can be sampled over months or years

It usually takes more than seconds to execute

Low load suggestion

CPU: 2 cores

RAM: 2-4 GB

IOPS: 1000

Moderate load recommendations

CPU: 4 to 6

Memory: 8-32 gb

IOPS: 1000 +

High load suggestion

CPU: 8 +

RAM: + 32 GB

IOPS: 1000 +

Enterprise Web Node

Enterprise Web servers are primarily HTTP servers with similar load requirements. For most applications, it doesn’t have to be very powerful. A cluster can use only one Web server, but for redundancy, multiple Web servers can be connected to a single back-end Postgres database.

Note: Production clusters should not use SQLite databases because it does not allow for redundant Web servers and does not handle high loads as gracefully as Postgres.

Is generally recommended

CPU: 1-4 cores

RAM: 1-2 GB

IOPS: 50

8. Single-machine performance test for InfluxDB

The official hardware reference configuration is:

Two pieces of SSD. One block stores influxDB/WAL and one block stores influxDB /data. At least 8 GB memory

A performance test is performed for the VM InfluxDB. The results are recorded for future reference.

CentOS Linux release 7.5.1804

InfluxDB Version: V1.7.4

CPU: Intel(R) Xeon(R) CPU E5-2403 0 @ 1.80GHz, four quad-core cpus

Memory: 16 g

Hard disk: a mechanical hard disk

Batch write official guidance:

A batch of points can be submitted to the database using a single HTTP request to the write endpoint. This makes writing to the HTTP API more efficient by drastically reducing HTTP overhead.

InfluxData recommends batch sizes of 5,000-10,000 points, but different use cases can be better served by significantly smaller or larger batches.

Result before optimization (write) : Data written by a single thread exceeds 1200W fails to be written and the memory is insufficient

Results after multiple optimizations (write) :

Full table scan performance

Conditional query performance

Time dimension query

Aggregate query performance

At the same measurement, a large amount of data is written in batches and then read by multiple threads. As the amount of data increases, the read speed slows down.

In the case of different measurement, read and write are not affected by each other.

The server should have a memory configuration limit to minimize the risk that an internal process triggers OOM.

Current behavior:

There are several places in the system where a large number of allocations can occur, which may be the risk of OOM occurrence.

In general, this happens when:

Query – Often too wide a filter condition causes too many families to be loaded at once, such as select * operations (confirm existence)

In-memory Indexing – High cardinality (series) series causes index growth and OOM process (confirm existence)

Writes – Many ongoing high-volume and slow disk writes result in the building of large internal buffers while waiting for disk writes/syncs. (Confirm existence)

Compression – In some cases, large series can be loaded for more optimized deduplicative and chunking of points. (Not yet found)