[This is my 14th day of The November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021]
1. Configuration mode
Elasticsearch provides good defaults and requires very little configuration. The configuration file should contain the node-specific Settings as well as the cluster configuration
- Node Settings:
node.name
,paths
- Cluster configuration:
cluster.name
,network.host
Elasticsearch has three configuration files:
elasticsearch.yml
Used to configure Elasticsearchjvm.options
Used to configure Elasticsearch JVM Settingslog4j2.properties
Used to configure Elasticsearch logging
The configuration file is located in the config directory, and its default location depends on the installation method. For example, through the docker installation es configuration file located in/usr/share/elasticsearch/config. The configuration file also supports environment variable substitution, as shown below.
node.name: ${HOSTNAME}
network.host: ${ES_NETWORK_HOST}
Copy the code
The value of the environment variable must be a simple string, provided with a comma-delimited string that Elasticsearch will parse into the value of the list.
Cluster nodes are divided into dynamic and static nodes. Dynamic nodes can be updated at run time and support temporary and persistent Settings. Static Settings can only use ElasticSearch.yml on nodes that are not enabled or disabled.
Elasticsearch will apply the Settings in the following order:
- Transient: Temporary Settings will be removed after the first full cluster restart
- Persistent Settings: Can survive a full cluster restart
elasticsearch.yml
file- The default configuration
Temporary or persistent Settings are implemented through the API, and the specific type is determined by the field. Such as:
PUT /_cluster/settings
{
"persistent" : {
"discovery.zen.minimum_master_nodes" : 2
},
"transient" : {
"indices.store.throttle.max_bytes_per_sec" : "50mb"
}
}
{
"acknowledged" : true,
"persistent" : {
"discovery" : {
"zen" : {
"minimum_master_nodes" : "2"
}
}
},
"transient" : { }
}
Copy the code
Second, important configuration
1. Set the path
Elasticsearch writes the data you index to the index and the data stream to the data directory. Elasticsearch writes its own application logs to a logs directory that contains information about cluster health and operations. Such as:
path:
data: /var/data/elasticsearch
logs: /var/log/elasticsearch
Copy the code
Warn: Do not modify anything in the data directory or run processes that might interfere with its contents. If something other than Elasticsearch changes the contents of the data directory, then Elasticsearch may fail, report corruption or other data inconsistencies, or it may work without silently losing some data. Do not attempt a file system backup of the data directory; There is no supported way to restore such a backup. Instead, snapshot-restore is used for secure backups. Do not run virus scanners on data directories. A virus scanner may prevent Elasticsearch from working properly and may modify the contents of the data directory. The data directory does not contain executable files, so virus scans will only pick up false positives.
2. Set the cluster name
A node can be added to a cluster only when its cluster.name is the same as all other nodes in the cluster. The default name is ElasticSearch. You should not set the same cluster name in different environments to avoid errors.
cluster.name: "docker-cluster"
Copy the code
3. Set the node name
The node name is used to describe the node and is returned in many responses.
node.name: "es02"
Copy the code
4. Set the network host
By default, Elasticsearch is only bound to loopback addresses, such as 127.0.0.1 and [::1]; If the loopback address is set, ES is in development mode and no boot check is performed. Non-loopback address Settings:
Network. The host: 192.168.1.10Copy the code
5. Configure cluster discovery
Elasticsearch will combine existing loopback addresses and scan local ports 9300 to 9305 to connect to other nodes running on the same server. This behavior provides a way to automatically connect clusters without any configuration.
If you want to connect to nodes on other servers, you need to set up other discoverable nodes using discovery.seed_hosts. The addresses can be ipv4, ipv6, or domain name. For example,
Discovery. Seed_hosts: - 192.168.1.10:9300-192.168.1.11 - SEEDs.mydomain.com - [0:0:0:0: FFFF: C0A8:10C]:9301Copy the code
When the Elasticsearch cluster starts for the first time, the cluster boot selects the primary node by counting votes in the first election. You can set this election list by using the cluster.initial_master_nodes setting. In development mode, if seed_hosts is not configured, this step is performed automatically by the node itself.
cluster.initial_master_nodes: es01,es02
Copy the code
6. Heap size Settings
By default, ES automatically sets the JVM heap size based on the role of the node and total memory, and most production environments can use the default size.
To override the default heap size, set the minimum heap using Xms and the maximum heap using Xmx. The minimum and maximum values must be the same; Set Xms and Xmx not to exceed 50% of total memory; The JVM. Options file can be configured by setting the heap size.
Docker-compose: docker-compose
es01:
environment:
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
Copy the code
7. Set the JVM heap dump path
By default, Elasticsearch configures the JVM to dump the heap in case of an out-of-memory exception to the default data directory; Options file can be modified to set the parameter -xx :HeapDumpPath.
8. GC log Settings
By default, ES enables garbage collection (GC) logging. The ES logs are configured in the Jvm.options file and printed to the default location. By default, logs are rotated every 64 MB, which consumes up to 2 GB of disk space. Such as:
# disable log - Xlog: disable - Xlog: all = warning: stderr: utctime, level, the size of the tags # configuration directory, log rotation -Xlog:gc*,gc+age=trace,safepoint:file=/opt/my-app/gc.log:utctime,pid,tags:filecount=32,filesize=64mCopy the code
9. Temporary directory Settings
By default, ES uses the startup script to create a private temporary directory directly under the system temporary directory. On some Linux distributions, the system utility clears/TMP of files and directories that have not been accessed recently. This behavior causes private temporary directories to be deleted at ES runtime if the temporary directory functionality is not needed for a long time. Deleting the private temporary directory can cause problems if you subsequently use features that require this directory.
To avoid exceptions, you need to set temporary directory permissions so that only users running Elasticsearch can access it.
10. JVM fatal error log Settings
By default, Elasticsearch configures the JVM to write fatal error logs to the default log directory. You can modify the jvm.options file by setting -xx :ErrorFile to change the file path.
11. Cluster backup
In a disaster, snapshots can prevent permanent data loss; The only reliable and supported way to back up a cluster is to take a snapshot. You cannot back up an Elasticsearch cluster by copying its node’s data directory.
Third, refer to the article
Important Elasticsearch configuration