Elasticsearch notes (3)

This is the 13th day of my participation in The August Wenwen Challenge.

Description of elasticearch.yml file configuration

Cluster Cluster configuration

The parameter name	The default value	instructions
cluster.name	elasticsearch	Cluster name. Only nodes with the same cluster name can form a logical cluster

Node Node Configuration

The parameter name	The default value	instructions
node.name	System generated	Node name. Each node name in the same cluster must be different. If this parameter is not specified, the system generates the node name by default
node.attr.rack		Node tribe attribute

Paths Configuration

The parameter name	The default value	instructions
path.conf		Configuration file path
path.data		Data storage locations. You can separate multiple data storage locations with commas (,) to improve security
path.work		Temporary file storage path
path.logs		Path for storing log files

Memory Memory configuration

The parameter name	The default value	instructions
bootstrap.mlockall	true	Considerations Considerations ElasticSearch will be poor performance when the JVM starts writing to swap space and should not be written to swap space, set this property to True to lock memory and also allow the ElasticSearch process to lock memory

Network and HTTP configuration

The parameter name	The default value	instructions
network.bind_host	0.0.0.0	The bound IP address can be ipv4 or ipv6. The default IP address is 0.0.0.0
network.publish_host		Set the IP address for other nodes to interact with this node. If you do not set the IP address, it will be set automatically. The value must be a real IP address
network.host		Set both the bind_host and publish_host parameters
transport.tcp.port	9300	Set the TCP port for interaction between nodes. The default port is 9300
transport.tcp.compress	false	This parameter specifies whether to compress TCP transmission data. The default value is false
http.port	9200	Set the HTTP port for external services. The default port is 9200
http.max_content_length	100Mb	Set the maximum size of requested content. The default size is 100MB
http.enabled	false	Use HTTP to provide services externally. The default value is true
http.cors.enabled	true	Enable the head plug-in to monitor cluster information
http.cors.allow-origin	“*”	Enable the head plug-in to monitor cluster information
http.cors.allow-credentials	true	Enable the head plug-in to monitor cluster information

Geteway gateway

The parameter name	The default value	instructions
gateway.type	local	The default gateway type is Local, which indicates the local file system. You can set it to the local file system
gateway.recover_after_nodes	1	Recovery can be performed only after N nodes in a cluster are started
gateway.recover_after_time	5m	Set the timeout period of the initial recovery process. The timeout period starts from the time when N nodes configured in the previous configuration are started
gateway.expected_nodes	2	Sets how many nodes are expected in the cluster. Local node recovery starts after the N nodes are started (and recover_after_nodes also fits) are added to the cluster.

Recovery Cluster Recovery → Cluster HA configuration

The parameter name	The default value	instructions
cluster.routing.allocation.node_initial_primaries_recoveries	4	Number of concurrent recovery threads during initial data recovery. The default value is 4
cluster.routing.allocation.node_concurrent_recoveries	2	Number of concurrent recovery threads for adding or deleting nodes or load balancing. The default value is 2
indices.recovery.max_bytes_per_sec	0	Set the throughput at recovery time (for example: 100MB, default is 0 unlimited. If there are other services running on the machine, it is better to limit them.
indices.recovery.concurrent_streams	5	Set to limit the maximum number of concurrent streams that can be opened simultaneously while recovering data from other shards. The default value is 5

Discovery cluster Discovery → Cluster Discovery using the main cluster Discovery configuration

The parameter name	The default value	instructions
discovery.zen.minimum_master_nodes	1	Set this parameter to ensure that the nodes in the cluster know about the other N master nodes. The default is 1. For large clusters, a larger value (2-4) can be set.
discovery.zen.ping.timeout	3	Probe timeout, 3 seconds by default, increased a bit in case of bad network, to prevent brain split
discovery.zen.ping.multicast.enabled	true	Set whether to enable the multicast discovery node. The default is true when multicast is unavailable or when the cluster is across network segments
discovery.zen.ping.unicast.hosts		This is an initial list of primary nodes in a cluster that is probed when the node (primary or data node) is started

Cache Cache → Configuration about Cache

The parameter name	The default value	instructions
indices.cache.filter.size	10%	Filter cache threshold, for example, 1gb or 20%
index.cache.field.expire		Cache past time
index.cache.field.max_size		Maximum number of entries in the cache
index.cache.field.type		Cache type

Translog configuration → Log configuration for transaction operations

The parameter name	The default value	instructions
index.translog.flush_threshold_ops	unlimited	Flush when the number of operations occurred
index.translog.flush_threshold_size	512mb	When the translog size reaches this value, a flush operation is performed
index.translog.flush_threshold_period	30m	If no flush operation is performed within the specified period of time, a forced flush operation is performed
index.translog.interval	5s	How often will translog be checked to perform a flush operation? Es will perform a random flush operation between this value and twice this value
index.gateway.local.sync	5s	How often do I write to disk

ES operation (part 1)

Index related operations (including mapping and setting)

Index health checks (minor) can be done with ESheader

# get_cluster /health:/ / 192.168.123.64:9200 / _cluster/healthReturn content ### {# node name"cluster_name": "es-application"The most important piece of response information is the status field. The status can be one of three values #green: all master and replica shards are allocated. Your cluster is 100% available. # YELLOW: All master shards have been sharded, but at least one copy is missing. No data is lost, so search results remain intact. However, your high availability is somewhat weakened. If more shards disappear, you lose data. Think of yellow as a warning that needs to be investigated in time. At least one master shard (and all its copies) is missing. This means you are missing data: a search returns only partial data, and a write request assigned to the shard returns an exception."status": "green"# Green/YELLOW /red status is a great way to get an overview of your cluster and see what's going on right now. The remaining metrics give you an overview of the cluster's state:"timed_out": false, #number_of_nodes and number_of_data_nodes the names are completely self-descriptive."number_of_nodes": 1."number_of_data_nodes": 1, # indicates the number of master shards in your cluster. This is a summary value that covers all indexes."active_primary_shards": 0, # is the total value of _ all _ shards covering all indexes, that is, including replica shards"active_shards": 0, # displays the number of shards currently migrating from one node to another. Normally it should be 0, but this value increases when Elasticsearch finds that the cluster is not very balanced. For example, a new node is added, or a node is taken offline."relocating_shards": 0, # is the number of shards just created. For example, when you create your first index, the shard will be in an 'initializing' state for a short time. This is usually a temporary event, and sharding should not stay in the 'initializing' state for long. You may also see 'initializing' sharding when the node is first restarted: when the sharding is loaded from disk, it starts from the 'initializing' state"initializing_shards": 0, # is a shard that already exists in the cluster state, but is not actually found in the cluster. Usually the source of an unallocated shard is an unallocated copy. For example, an index with 5 shards and 1 replica will have 5 unallocated replica shards on a single-node cluster. If your cluster is in 'red' state, it will hold unallocated shards for a long time (due to lack of primary shards)"unassigned_shards": 0."delayed_unassigned_shards": 0."number_of_pending_tasks": 0."number_of_in_flight_fetch": 0."task_max_waiting_in_queue_millis": 0."active_shards_percent_as_number": 100.0
}
Copy the code

Create indexes

#PUT /{index name} (custom)/ / 192.168.123.64:9200 / index_jacquesh
{
    "settings": {
        "index": {
            "number_of_shards": "2", # number of master shards"number_of_replicas": "0"}} # return {"acknowledged": true."shards_acknowledged": true."index": "index_jacquesh"} # create successfullyCopy the code

View index

# GET _cat/indices? vCopy the code

return

Remove the index

#DELETE /{index name} (custom)/ / 192.168.123.64:9200 / index_jacquesh1# returns {"acknowledged": true} # delete succeededCopy the code

Description of elasticearch.yml file configuration

Cluster Cluster configuration

Node Node Configuration

Paths Configuration

Memory Memory configuration

Network and HTTP configuration

Geteway gateway

Recovery Cluster Recovery → Cluster HA configuration

Discovery cluster Discovery → Cluster Discovery using the main cluster Discovery configuration

Cache Cache → Configuration about Cache

Translog configuration → Log configuration for transaction operations

ES operation (part 1)

Index related operations (including mapping and setting)

Index health checks (minor) can be done with ESheader

Create indexes

View index

Remove the index

Related Posts

Ask questions! Do you really understand JVM class loading? Master these points, no longer difficult!

Python Crawler Introduction Tutorial incremental crawlers have to tell the story

Python exception Handling