Data Tiers for Elasticsearch 7.10

The data layer is a collection of nodes with the same data role, which typically share the same hardware configuration file:

Content layer node: Handles indexing and query loads for content such as product catalogs.
Hot tier node: Handles the indexing load of time series data such as logs or metrics and holds your most recent and frequently accessed data.
Warm layer node: the preserved time series data has low access frequency and seldom needs to be updated.
Cold layer node: Holds time series data, which is accessed occasionally and usually not updated.

When you index documents directly to a specific index, they remain on the content layer nodes indefinitely.

When you index documents into a data stream, they are initially located on hot tier nodes. You can configure index Lifecycle Management (ILM) policies to automatically convert time series data through the hot, warm and cold layers based on performance, resilience, and data retention requirements.

The data role of the node is configured in ElasticSearch.yml. For example, the highest performing nodes in the cluster can be assigned to the hot layer and the content layer:

node.roles: ["data_hot", "data_content"]
Copy the code

Content tier

Data stored in the content layer is typically a collection of items, such as product catalogs or article archives. Unlike time series data, the value of content remains relatively constant over time, so it doesn’t make sense to move it to layers with different performance characteristics over time. Content data often has long data retention requirements, and you want to be able to retrieve items quickly, no matter how old they are.

Content layer nodes are typically optimized for query performance, prioritizing processing power over IO throughput so they can handle complex searches and aggregations and return results quickly. Although they are also responsible for indexing, content data is typically not ingested at the same rate as time series data, such as logs and metrics. From an elastic perspective, indexes in this layer should be configured to use one or more replicas.

Unless new indexes are part of the data flow, they are automatically assigned to the content layer.

Hot tier

Hotlayer is the Elasticsearch entry point for time series data and holds your most recently, most frequently searched time series data. Nodes in the hot layer need to be fast to read and write, which requires more hardware resources and faster storage (SSDS). To be resilient, indexes in the hot layer should be configured to use one or more copies.

New indexes that belong to the data flow are automatically assigned to the hot layer.

Warm tier

Once the time series data is queried less frequently than the most recently indexed data in the hot layer, it can be moved to this layer. The warm layer usually holds data from the last few weeks. Updates are still allowed, but probably very rarely. In general, nodes in the warm layer do not need to be as fast as nodes in the hot layer. To achieve elasticity, indexes in the warm layer should be configured to use one or more copies.

Cold tier

Once the data is no longer updated, it can move from the warm layer to the cold layer and stay there for the rest of the time. The cold layer is still the response query layer, but the data in the cold layer is usually not updated. As the data transitions to the cold layer, it can be compressed and shrunk. To be resilient, indexes in the cold layer can rely on searchable snapshots, eliminating the need for replicas.

Data tier index allocation

Create indexes, by default, Elasticsearch will index. The routing. The allocation. The include. _tier_preference set to data_content, with will be automatically assigned to the content index divided layer.

When Elasticsearch indexes to be created as part of the data flow, by default, Elasticsearch will index. The routing. The allocation. The include. _tier_preference set to data_hot, To automatically assign index sharding to the thermosphere.

You can override automatic layer-based automatic allocation by specifying the sharding allocation filtering Settings in the index template that creates the index request or matches the new index.

You can also explicitly set index. The routing. The allocation. The include. _tier_preference to opt out of the default based on the distribution of the layer. If the layer preference is set to NULL, Elasticsearch ignores the data layer role during allocation.

Automatic data tier migration

ILM uses migration operations to automatically manage indexes between available data layers. By default, this operation is injected automatically at each stage. You can explicitly specify the migration action to override the default behavior, or you can manually specify the allocation rules using the allocation action.

See the website: www.elastic.co/guide/en/el…

Translation is not allowed to ask for more advice, translation is not easy do not embezzle, such as use, please indicate the source

Content tier

Hot tier

Warm tier

Cold tier

Data tier index allocation

Automatic data tier migration

Related Posts

I found several Docker open source image repositories for the technical manager. Why did the manager choose Sonatype Nexus?

Spring source code reading environment

Chrome Developer Tools Cookie TAB see ga Cookie meaning