• Elasticsearch Reference[2.2]
  • The original address: https://www.elastic.co/guide/en/elasticsearch/reference/2.2/setup-configuration.html
  • Translator: code4j

configuration

The environment variable

Elasticsearch uses the built-in JAVA_OPTS variable as the JVM startup parameter. The most important parameter is -xmx, which controls the maximum heap memory that the process can allocate, and -xms, which controls the minimum heap memory that the process can allocate (generally the more the better). In general, it is recommended to leave the JAVA_OPTS variable unchanged and instead use ES_JAVA_OPTS to change the JVM parameter configuration. The ES_HEAP_SIZE parameter is used to configure the size of the heap allocated by Java for ES processes. The maximum and minimum values are the same, but can also be set by specifying the minimum and maximum values: ES_MIN_MEM (default: 256MB) and ES_MAX_MEM (default: 1G). (Translator’s Note: Usually recommended maximum minimum memory set to the same, because ES as much as possible the server cluster resources are usually provide for the use of cluster, so tube enough resources, and set different maximum minimum, produces the memory expansion led to much overhead) suggested that set the maximum minimum value, and open (mlockall) (translator note: The mlockall parameter prevents processes from swapping memory. Memory swapping is inefficient because disk reads and writes are required. Set bootstrap.mlockall to true.)

The system configuration

The file descriptor is sure to increase the maximum number of open files on your machine (or users running ES). 32K or 64K is recommended. To test the maximum number of open files for a process, set the -des. Max-open-files parameter to true at startup. This will print the maximum number of open files when the system starts. Alternatively, you can get the maximum number of open files per node by parsing the max_file_descriptors parameter in the result of the following API:

curl localhost:9200/_nodes/stats/process? pretty

Virtual memory Elasticsearch uses the mmAPS/NIOFS directory storage type to store indexes by default. The default operating system limits the virtual address space limit in MMAPS (memory mapped mode) are too small and may cause memory overflow. On Linux, you can extend the limit by using the following command as user root:

sysctl -w vm.max_map_count=262144

To permanently change the flower of this parameter, modify the vm.max_map_count configuration in the /etc/sysctl.conf configuration file.

Note: this configuration is automatically changed if you install ES (.deb or RPM) using the sysctl vm.max_map_count command.

Most operating systems use as much memory as possible to cache the file system and urgently swap unwanted application memory to disk. This can cause the Elasticsearch process to be swapped to disk. Swap is a performance drain and affects node stability, so avoid this at all costs. You have three options:

  • Disable the swapSimply disable swap, usually Elasticsearch runs as a service in a sandbox environment, with memory passing through system variablesES_HEAP_SIZESo swap should not be enabled. On Linux, you can use commandssudo swapoff -aTo disable it permanently, you need to modify the configuration file/etc/fstabAnd find the comments includedswapOf the line. On Windows, you can disable paging files completely by right-click on my computer and choose Advanced System Settings > Advanced/Performance, go to Settings > Advanced > Virtual Memory, click Change > Select Non-paging Files.
  • configurationswappinessThe second option is to ensure that SYSCTL is configuredvm.swappinessSet it to 0. This can reduce the frequency of kernel swap. Normally, no swap is generated, but the system will still swap in case of emergency.

Note: On kernel versions 3.5-RC1 and above, vm.swappiness is set to 0, triggering OOM will kill processes instead of swap, you need to set this to 1 in case of emergency to make sure swap works.

  • mlockallThe third option is to use on Linux/Unixmlockall, use WindowsVirtualLockTo prevent Elasticsearch from swapping out the process’s address space, lock the process’s address space in the configuration fileelasticsearch.ymlAdd the following configuration to:

bootstrap.mlockall: true

Once started, you can see if the configuration is successful by checking the value of mlockall in the result of this command:

curl http://localhost:9200/_nodes/process? pretty

If you find that the value of mlockall is false, the configuration has failed. The most likely reason is that the user who started ES on Linux/Unix does not have permission to lock memory. You can use ulimit -l Unlimited as user root before starting.

Another possible cause is that the temporary file directory (/ TMP) was mounted with the noexec option. You can fix this by specifying a new temporary directory when starting ES:

./bin/elasticsearch -Djna.tmpdir=/path/to/new/dir

Note: Enabling mlockall may cause the JVM or shell to call back and exit if you attempt to apply for more memory than is available.

Elasticsearch configuration

The ES_HOME/config directory contains two files: elasticSearch. Yml is used to configure the ES module, and logging.yml is used to configure the ES log. The configuration style is YAML. In the following example, modify the address information bound to all network modules as follows:

Network. The host: 10.0.0.4

The yamL configuration style is slightly different from the one I use. Here I will use the comments in the default configuration file as demo.

directory

For production use, we definitely need to change the data and log directories:

path.data: /var/data/elasticsearch path.logs: /var/log/elasticsearch

(Note: The data directory can be written to multiple disks, separated by commas, to achieve disk arrays. The data can be written to multiple disks, reducing the overhead of read/write locks. This is not currently practiced by translators.)

The cluster name

Don’t forget to name your cluster so that nodes can be discovered and automatically added to the cluster.

cluster.name:

Make sure that your cluster name is not reused across different environments, or it could cause nodes to join the wrong cluster. For example, you can use logging-dev,logging-stage, logging-prod to represent development clusters, pre-release clusters, and production clusters.

The node name

You may also need to change the node name, just like the host name. By default, the node starts with a random selection of 3,000 Marvel character names.

node.name:

(Translator’s notes: Recommend naming with numbering to distinguish convenient management and maintenance)

The HOSTNAME of the machine can be obtained from the system variable HOSTNAME. If your machine is running only one node in the cluster, you can set the node name to the HOSTNAME using the tag ${… }.

node.name: ${HOSTNAME}

(Note: This configuration is highly unrecommended, unmanageable, and risky. Think of it as a trick test. Unless you make some security or permission plugin, use this configuration.

ES uses “namespaced” compression internally to handle these configurations. You can also use a JSON-style configuration file named: elasticSearch. JSON: Code style

{
    "network" : {
        "host" : "10.0.0.4"}}Copy the code

This means you can easily configure it externally using ES_JAVA_OPTS or boot time parameters, such as:

. / elasticsearch – Des.net work. Host = 10.0.0.4

In addition, if you do not want to store your configuration, you can also use wildcards to pass values from the foreground when starting ES, using either ${prompt. Text} or ${prompt.

node.name: ${prompt.text}

After starting ES, you will be prompted to enter parameter values as follows:

Enter value for [node.name]:

If ${prompt. Text} or ${prompt.

The index configuration

An index created in a cluster has its own configuration, for example, the following configuration to create an index sets the refresh interval to 5s instead of using the default refresh interval (either in YAML or JSON format) :

$ curl -XPUT http://localhost:9200/kimchy/ -d

‘ index: refresh_interval: 5s ‘

The index level configuration can also be used for node-level configuration, such as elasticSearch.yml:

index.refresh_interval: 5s

This means that every index created on a particular node will use a refresh interval of 5s unless it is set when the index is created. Alternatively, configuration at the index level can override configuration at the node level.

All configuration information can be found in the index module.

The log configuration

Elasticsearch uses log4J logging internally, out of the box, and simplifies log4j configuration using YAML configuration files, conf/logging.yml, as well as JSON-style configuration files. Multiple configuration files are also supported and will be consolidated, provided that the file name is logging. Start with a supported suffix (currently.yml,.yaml,.json, or.properties). The logger includes the Java package name and the corresponding logging level, and you can omit the org.elasticSearch prefix. The Appender section includes the log destination. See Log4j Documentation for more custom logging configurations and appender types. Additional appenders and other logging classes provided by Log4J-Extras are also supported, right out of the box.

Overdue log

In addition to regular logging, ES allows you to enable out-of-date logging. For example, if you want to migrate some functionality, you need to decide in advance. Expiration logging is disabled by default. You can enable it by:

deprecation: DEBUG, deprecation_log_file

This will create a scrolling log in your log directory every day, checking it often

2 Settings for Elasticsearch