In releases following Elastic Stack 7.9, Node roles can be configured directly in Elasticsearch’s configuration file. This is a new change. Prior to the 7.9 release, we used node.master: true to define a master node, but since 7.9, we can use another method to define a master node. Roles can be used to define a master node. However, you can only use one of these two methods, not both methods at the same time. Since release 7.9, it is recommended to use Node.roles to define node roles. In today’s article, I’ll cover Roles for Node.roles.

 

Node

Whenever you start an instance of Elasticsearch, you start the node. A collection of connected nodes is called a cluster. If you are running a single node of Elasticsearch, you will have a cluster of one node.

By default, each node in the cluster can handle BOTH HTTP and Transport traffic. The Transport layer is dedicated to communication between nodes. The HTTP layer is used by REST clients.

All nodes are aware of all other nodes in the cluster and can forward client requests to the appropriate nodes.

By default, nodes are of all the following types: master-eligible, Data, INGest, and (if available) Machine Learning. All data nodes are also transform nodes.

Note: As the cluster grows, especially if you have a lot of machine learning work or continuous transitions, consider separating the master-eligible nodes from the dedicated Data, Machine Learning, and Transform nodes.

 

Node roles

You can define the roles of a node by setting node.roles. If you do not configure this setting, the node has the following roles by default:

  • master
  • data
  • data_content
  • data_hot
  • data_warm
  • data_cold
  • ingest
  • ml
  • remote_cluster_client

Note: If you set Node.roles, node will be assigned to the specified role.

Master-eligible node

A node with a master role (default), which makes it eligible to be selected as the master node of the control cluster.

Data node

Nodes with data roles (default). Data nodes hold data and perform data-related operations, such as CRUD, search, and aggregation. Nodes with data roles can populate any specialized data node role.

Ingest node

Nodes with ingest roles (default). Ingestion nodes can apply extraction pipelines to documents to transform and enrich them before indexing. Under heavy ingestion loads, it makes sense to use dedicated ingestion nodes that do not contain ingestion roles in nodes with master or data roles.

Remote-eligible node

A node that has the remote_cluster_client role (the default), which makes it eligible to act as a remote client. By default, any node in the cluster can act as a cross-cluster client and connect to a remote cluster.

Machine learning node

Node with xpack.ml. Enabled and ML roles, which is the default behavior in the default distribution of Elasticsearch. To use machine learning capabilities, there must be at least one machine learning node in the cluster. For more information about machine learning capabilities, see Machine Learning in Elastic Stack. You can also find a machine learning chapter in my previous post “Elastic: Beginner’s Guide.”

Important: If you are using OSS distributions, do not add ML roles, otherwise the node will not start.

Transform node

Node with transform role. If a transformation is to be used, there is at least one transformation node in the cluster. For more information, see Transform Settings and Transform Data.

Note: Coordnating node – coordination node.

Requests such as search requests or bulk index requests may involve data stored on different data nodes. For example, a search request is executed in two phases, which are coordinated by the node that receives the customer request (the coordination node).

In the decentralization phase, the coordinating node forwards the request to the data node that holds the data. Each data node performs the request locally and returns its results to the coordinating node. During the collection phase, the coordination node reduces the results of each data node to a single global result set.

Each node is implicitly a coordination node. This means that nodes with an explicit list of empty roles via Node.Roles will only act as coordination nodes and cannot be disabled. As a result, such nodes need to have sufficient memory and CPU to handle the collection phase. A coordination node is defined as node.roles: []

 

Master-eligible node

The master node is responsible for lightweight cluster-wide operations, such as creating or deleting indexes, keeping track of which nodes are part of the cluster, and determining which shards are assigned to which nodes. Having a stable primary node is important for cluster health.

Any non-voting -only node with master-eligible attributes can be made the primary node through the Master Election process.

Important: The primary node must have access to the data/ directory (just like the data node), because this is where cluster state will remain between node restarts.

 

Proprietary master – eligible node

Once a Master-eligible node is elected as a Master node, it is required to perform its duties to keep the resource cluster healthy. If the primary node is overloaded with other heavy tasks, the cluster may not work well. In particular, creating index data and searching data can be very resource-intensive, so it is best to avoid using master-eligible nodes to perform tasks such as indexing and searching in large or high-throughput clusters. You can do this by configuring the three nodes as dedicated Master-eligible nodes. The dedicated Master-eligible nodes have only the Master role, allowing them to focus on managing the cluster. Although the master node can also act as a coordinating node and route search and index requests from the client to the data node, it is best not to use a dedicated master node for this purpose.

To create a dedicated host-qualified node, set:

node.roles: [ master ]
Copy the code

Voting-only master-eligible node

Master-eligible nodes that vote only are eligible nodes that participate in the Master election but do not act as the primary node of the cluster. In particular, only voting nodes can act as tie-breakers in an election.

Using the term “master eligible” to describe vote-only nodes may seem confusing, since such nodes are not actually eligible to be hosts at all. This term is an unfortunate consequence of history: primary nodes are those who participate in elections and perform certain tasks in the cluster state release, and only voting nodes have the same responsibilities, even though they can never become elected primary nodes.

To configure eligible primary nodes as voting-only nodes, include master and voting_only in the role list. For example, create voting_only and data nodes:

node.roles: [ data, master, voting_only ]
Copy the code

Important: Voting_only requires the default release of Elasticsearch and is not supported in OSS releases. If you use an OSS distribution and add the voting_only role, this node will not start. Also note that only a master node can be marked as having a voting_only role.

A high availability (HA) cluster requires at least three primary nodes, at least two of which are not voting_only nodes. Such a cluster will be able to elect a master node even if one of the nodes fails.

Since voting_only nodes are never the cluster master, they may need a smaller heap and a less powerful CPU than a true master node. However, all master-eligible nodes, including the Voting_only node, require fairly fast persistent storage and a reliable and low-latency network connection to the rest of the cluster because they are on the critical path for publishing cluster status updates.

Master-eligible nodes with voting_only can also act as other roles in the cluster. For example, a node can be either a data node or a master-eligible node voting_only. The dedicated Voting_only master-eligible screen does not have any other roles in the cluster. To create a dedicated voting_only master-eligible node in the default release, set:

node.roles: [ master, voting_only ]
Copy the code

 

Data node

Data nodes contain shards of documents that you have indexed. Data nodes handle data-related operations, such as CRUD, search, and aggregation. These operations are I/O, memory, and CPU intensive. It is important to monitor these resources and add more data nodes if they become overloaded.

The main benefit of having dedicated data nodes is to separate the master and data roles.

To create a dedicated data node, set:

node.roles: [ data ]
Copy the code

In a multi-tier deployment architecture, you can assign data nodes to specific layers using specialized data roles: datA_Content, datA_HOT, data_warm, or data_Cold. A node can belong to multiple layers, but a node with one of the dedicated data roles cannot have a common data role.

 

Content data node

The Content Data node holds user-created Content. They enable operations like CRUD, search, and aggregation.

To create a dedicated Content node, set:

node.roles: [ data_content ]
Copy the code

 

Hot data node

The Hot Data data node stores time series data when entering Elasticsearch. Hot layers must be able to read and write quickly and require more hardware resources (such as SSD drives).

To create a dedicated HOT node, set:

node.roles: [ data_hot ]
Copy the code

 

Warm data node

Indexes stored by the Warm Data node are no longer updated regularly, but are still being queried. The volume of queries is usually less frequent than when the index is in the hot layer. Lower-performing hardware is typically available for nodes in this layer.

To create a dedicated Warm node, set:

node.roles: [ data_warm ]
Copy the code

 

Cold data node

The Cold Data node stores a read-only index that is accessed infrequently. This layer uses low-performing hardware and may utilize searchable snapshot indexes to minimize the resources required.

To create a dedicated cold node, set:

node.roles: [ data_cold ]
Copy the code

 

Coordinating only node

If a node does not act as a master node, does not save data, and does not preprocess documents, then the node will have a coordinating node that only routes requests, handles the search reduction phase, and allocates bulk indexes. In essence, only coordination nodes can act as intelligent load balancers.

Coordination only nodes can benefit large clusters by sharing a large number of coordination roles from data and master-eligible nodes. They join the cluster and receive the full cluster state like all other nodes, and use the cluster state to route requests directly to the appropriate location.

Warning: Adding too many coordination-only nodes to a cluster can increase the burden on the entire cluster, as the selected master node must wait for cluster status update confirmation for each node! The benefits of coordinating only nodes should not be overstated – data nodes can happily serve the same purpose.

To create a dedicated coordination node, set:

node.roles: [ ]
Copy the code

Remote-eligible node

By default, any node in the cluster can act as a cross-cluster client and connect to a remote cluster. Once connected, you can use cross-cluster search to search remote clusters. You can also use cross-cluster replication to synchronize data between clusters.

node.roles: [ remote_cluster_client ]
Copy the code

 

Machine learning node

Machine learning functionality provides machine learning nodes that run jobs and process machine learning API requests. If xpack.ml. Enabled is set to true and the node does not have an ML role, the node can process API requests but cannot run jobs.

To use machine learning in a cluster, machine learning must be enabled on all nodes that qualify as hosts (set xpack.ml. Enabled to true). If machine learning is to be used on the client side (including Kibana), it must also be enabled on all coordination nodes. Do not use these Settings if you only have an OSS distribution.

For more information about these Settings, see Machine Learning Settings.

To create a dedicated machine learning node in the default publication, set:

node.roles: [ ml ]
xpack.ml.enabled: true 
Copy the code

By default, xpack.ml. Enabled is already enabled.

 

Transform node

The transformation node runs the transformation and processes the transformation API requests. Do not use these Settings if you only have an OSS distribution. For more information, see Transformation Settings.

To create a dedicated transform node in the default distribution, set:

node.roles: [ transform ]
Copy the code

 

Change the role of a node

Each data node maintains the following data on disk:

  • The shard data for each shard assigned to the node,
  • Index metadata corresponding to each shard assigned to the node, and
  • Cluster-wide metadata, such as Settings and index templates.

Similarly, each master-eligible node maintains the following data on disk:

  • Index metadata for each index in the cluster, and
  • Cluster-wide metadata, such as Settings and index templates.

Each node checks the contents of its data path at startup. If unexpected data is found, it will refuse to start. This is to avoid importing unnecessary suspended indexes that can cause the cluster health to turn red. To be more precise, nodes without data roles will refuse to start if they find any shard data on disk at startup, while nodes without master and Data roles will refuse to start if they have any index metadata on disk at startup.

You can change the role of a node by adjusting its ElasticSearch.yml file and restarting it. This is called reusing nodes. To satisfy the above checks for unexpected data, you must perform some additional steps to prepare the nodes to be reused when starting a node that does not have a Data or master role.

  • If you want to reuse data nodes by removing data roles, you should first securely migrate all shard data to other nodes in the cluster using allocation filters.
  • If you want to make a node without both data and master roles, the easiest way is to start a brand new node with an empty data path and the required roles. You may find it safest to migrate the shard data to another location in the cluster using an allocation filter first.

If you cannot perform these additional steps, you can use the ElasticSearch-Node reuse tool to remove any excess data that prevents the node from starting.