[This is my 14th day of The November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021]


1. Configuration mode

Elasticsearch provides good defaults and requires very little configuration. The configuration file should contain the node-specific Settings as well as the cluster configuration

  • Node Settings:node.name,paths
  • Cluster configuration:cluster.name,network.host

Elasticsearch has three configuration files:

  • elasticsearch.ymlUsed to configure Elasticsearch
  • jvm.optionsUsed to configure Elasticsearch JVM Settings
  • log4j2.propertiesUsed to configure Elasticsearch logging

The configuration file is located in the config directory, and its default location depends on the installation method. For example, through the docker installation es configuration file located in/usr/share/elasticsearch/config. The configuration file also supports environment variable substitution, as shown below.

node.name:    ${HOSTNAME}
network.host: ${ES_NETWORK_HOST}
Copy the code

The value of the environment variable must be a simple string, provided with a comma-delimited string that Elasticsearch will parse into the value of the list.

Cluster nodes are divided into dynamic and static nodes. Dynamic nodes can be updated at run time and support temporary and persistent Settings. Static Settings can only use ElasticSearch.yml on nodes that are not enabled or disabled.

Elasticsearch will apply the Settings in the following order:

  • Transient: Temporary Settings will be removed after the first full cluster restart
  • Persistent Settings: Can survive a full cluster restart
  • elasticsearch.ymlfile
  • The default configuration

Temporary or persistent Settings are implemented through the API, and the specific type is determined by the field. Such as:

PUT /_cluster/settings
{
    "persistent" : {
        "discovery.zen.minimum_master_nodes" : 2 
    },
    "transient" : {
        "indices.store.throttle.max_bytes_per_sec" : "50mb" 
    }
}

{
  "acknowledged" : true,
  "persistent" : {
    "discovery" : {
      "zen" : {
        "minimum_master_nodes" : "2"
      }
    }
  },
  "transient" : { }
}
Copy the code

Second, important configuration

1. Set the path

Elasticsearch writes the data you index to the index and the data stream to the data directory. Elasticsearch writes its own application logs to a logs directory that contains information about cluster health and operations. Such as:

path:
  data: /var/data/elasticsearch
  logs: /var/log/elasticsearch
Copy the code

Warn: Do not modify anything in the data directory or run processes that might interfere with its contents. If something other than Elasticsearch changes the contents of the data directory, then Elasticsearch may fail, report corruption or other data inconsistencies, or it may work without silently losing some data. Do not attempt a file system backup of the data directory; There is no supported way to restore such a backup. Instead, snapshot-restore is used for secure backups. Do not run virus scanners on data directories. A virus scanner may prevent Elasticsearch from working properly and may modify the contents of the data directory. The data directory does not contain executable files, so virus scans will only pick up false positives.

2. Set the cluster name

A node can be added to a cluster only when its cluster.name is the same as all other nodes in the cluster. The default name is ElasticSearch. You should not set the same cluster name in different environments to avoid errors.

cluster.name: "docker-cluster"
Copy the code

3. Set the node name

The node name is used to describe the node and is returned in many responses.

node.name: "es02"
Copy the code

4. Set the network host

By default, Elasticsearch is only bound to loopback addresses, such as 127.0.0.1 and [::1]; If the loopback address is set, ES is in development mode and no boot check is performed. Non-loopback address Settings:

Network. The host: 192.168.1.10Copy the code

5. Configure cluster discovery

Elasticsearch will combine existing loopback addresses and scan local ports 9300 to 9305 to connect to other nodes running on the same server. This behavior provides a way to automatically connect clusters without any configuration.

If you want to connect to nodes on other servers, you need to set up other discoverable nodes using discovery.seed_hosts. The addresses can be ipv4, ipv6, or domain name. For example,

Discovery. Seed_hosts: - 192.168.1.10:9300-192.168.1.11 - SEEDs.mydomain.com - [0:0:0:0: FFFF: C0A8:10C]:9301Copy the code

When the Elasticsearch cluster starts for the first time, the cluster boot selects the primary node by counting votes in the first election. You can set this election list by using the cluster.initial_master_nodes setting. In development mode, if seed_hosts is not configured, this step is performed automatically by the node itself.

cluster.initial_master_nodes: es01,es02
Copy the code

6. Heap size Settings

By default, ES automatically sets the JVM heap size based on the role of the node and total memory, and most production environments can use the default size.

To override the default heap size, set the minimum heap using Xms and the maximum heap using Xmx. The minimum and maximum values must be the same; Set Xms and Xmx not to exceed 50% of total memory; The JVM. Options file can be configured by setting the heap size.

Docker-compose: docker-compose

es01:
    environment:
        - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
Copy the code

7. Set the JVM heap dump path

By default, Elasticsearch configures the JVM to dump the heap in case of an out-of-memory exception to the default data directory; Options file can be modified to set the parameter -xx :HeapDumpPath.

8. GC log Settings

By default, ES enables garbage collection (GC) logging. The ES logs are configured in the Jvm.options file and printed to the default location. By default, logs are rotated every 64 MB, which consumes up to 2 GB of disk space. Such as:

# disable log - Xlog: disable - Xlog: all = warning: stderr: utctime, level, the size of the tags # configuration directory, log rotation -Xlog:gc*,gc+age=trace,safepoint:file=/opt/my-app/gc.log:utctime,pid,tags:filecount=32,filesize=64mCopy the code

9. Temporary directory Settings

By default, ES uses the startup script to create a private temporary directory directly under the system temporary directory. On some Linux distributions, the system utility clears/TMP of files and directories that have not been accessed recently. This behavior causes private temporary directories to be deleted at ES runtime if the temporary directory functionality is not needed for a long time. Deleting the private temporary directory can cause problems if you subsequently use features that require this directory.

To avoid exceptions, you need to set temporary directory permissions so that only users running Elasticsearch can access it.

10. JVM fatal error log Settings

By default, Elasticsearch configures the JVM to write fatal error logs to the default log directory. You can modify the jvm.options file by setting -xx :ErrorFile to change the file path.

11. Cluster backup

In a disaster, snapshots can prevent permanent data loss; The only reliable and supported way to back up a cluster is to take a snapshot. You cannot back up an Elasticsearch cluster by copying its node’s data directory.

Third, refer to the article

Important Elasticsearch configuration