Elasticsearch (Elasticsearch

1. Online practice related to thread pool

  • Fault 1: The bulk write throw exception was rejected when ElasticSearch was imported from Kafka consumption data. The ES cluster has four nodes: Node1 and node4 thread pool bulk Count Rejected 300 000 data es Bulk thread pool 8, queue 200, Kafka write thread pool 2*cores+cores/2, queue 3 At present, I want to balance the speed of writing and the speed of ES processing, but there is no available environmental pressure measurement, do you have any experience data or method reference?
  • Fault 2: Multiple systems use one cluster. The following error log is displayed
{"message""failed to execute pipeline for a bulk request" , 
"stacktrace": ["org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.ingest.IngestService$4@5b522103 on EsThreadPoolExecutor[name = node-2/write, queue capacity = 1024, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@19bdbd79[Running, pool size = 5, active threads = 5, queued tasks = 1677, ]]".Copy the code

For problem 2, we initially checked logs. The queue was full due to a large number of log writes, and the cluster refused to write logs directly.

Initial solution to problem 2: Change the default value, expand the queue, and observe the queue size based on subsequent services.

Elasticsearch thread pool and queue

2. Overview of thread pools

Elasticsearch uses Thread pools to manage requests and optimize resource usage on each node in the cluster.

3. Thread pool usage

The main thread pools include: search, get, write, etc.

To see the full picture of the thread pool, run the following command:

GET /_cat/GET /_cat/thread_pool/? v&h=id,name,active,rejected,completed,size,type&pretty&s=typeCopy the code

Among them:

  • Name: Represents some kind of thread pool (write, retrieve, refresh, or other).
  • Type: indicates the type of the thread number.

By running the command above, you can see that each node has many different thread pools, the size and type of the thread pool, and you can also see which nodes reject the operation.

Elasticsearch automatically configates thread pool parameters based on the number of threads detected per node (more on this later).

4. Thread pool types

4.1 Fixed type

A fixed number of threads with a fixed queue size.

The following is an example of the Fixed thread:

thread_pool:
    write:
        size: 30
        queue_size: 1000
Copy the code

4.2 Scaling type

A variable number of threads, Elasticsearch automatically adjusts thread size based on the workload (values between core and Max).

The following is an example of the Scaling thread:

thread_pool:
    warmer:
        core: 1
        max: 8
Copy the code

4.3 fixed_autoqueue_size thread

  • For a fixed number of threads, the queue size changes dynamically to maintain the target response time.
  • This feature will be deprecated in version 8.0+ and will not be explained here.

The following is an example of a fixed_autoqueue_size thread:

Variable queue size is emphasized.

thread_pool:
    search:
        size: 30
        queue_size: 500
        min_queue_size: 10
        max_queue_size: 1000
        auto_queue_frame_size: 2000
        target_response_time: 1s
Copy the code

5. Thread pool usage Examples

To see which threads are CPU efficient or take the most time, use the following query.

GET /_nodes/hot_threads
Copy the code

This API helps troubleshoot performance problems.

Elasticsearch hot_threads

6. Thread pool and queue awareness

Cognition 1: Set: Processors if necessary

It’s worth noting that the thread pool is set based on the number of threads Elasticsearch detects on the underlying hardware.

If detection fails, the number of threads available in the hardware should be explicitly set in ElasticSearch.yml.

In particular, if you want to change the thread pool or queue size of one of the Elasticsearch node instances configured on a host machine, consider configuring the Processors parameter.

Set elasticSearch. yml to the following:

processors: 4
Copy the code

PS: Check the number of threads in Linux.

grep 'processor' /proc/cpuinfo | sort -u | wc -l
Copy the code

Cognition 2: Thread pool associated queue setup

Most thread pools also have queues associated with them so that Elasticsearch can store requests in memory while waiting for resources to become available to process them.

However, queues typically have a limited size and Elasticsearch will reject the request if it exceeds that size.

Cognition 3: Bad practice — blindly changing the queue size

Sometimes you can increase the size of the queue to prevent requests from being rejected, but make changes based on the actual resources, not blindly.

In fact, if the value is set very large, it can even backfire. Because by setting a larger queue size, the node will need to use more memory to store the queue, which means that there will be relatively less memory left to respond to and manage the actual requests.

In addition, increasing the queue size increases the length of time that the response to an operation is kept in the queue, leading to client application timeouts.

The following rash behavior, we are to combat to avoid.

Cognition 4: Strengthen monitoring

In general, the only time it is necessary to increase the queue size is when the number of requests has skyrocketed so that the process cannot be managed on the client side and resource utilization has not peaked.

You can get a better idea of Elasticsearch cluster performance with the help of Kibana Stack Monitoring visualization metrics.

The general view, node view and index view in the Kibana monitoring panel are as follows:

  • General View Monitoring

  • Node View Monitoring

  • Indexed View monitoring

Above: A screenshot from Kibana version 7.6, highlighted in red as I was writing data in bulk.

  • Search Rate: indicates the search Rate
  • Search Latency: indicates the search Latency
  • Indexing Rate: Write speed
  • Indexing Latency: Write Latency

A Growing queue indicates that Elasticsearch is having trouble fulfilling a request, and a rejection indicates that the queue has grown to the point where Elasticsearch rejects it.

You need to examine the root cause of the queue increase and try to balance the strain on the cluster thread pool by alleviating related write or retrieve operations on the client side.

7. Online practice of thread pool and matters needing attention

7.1 Thread pool and queue Modification Configuration file ElasticSearch.yml needs to be modified

  • Node-level configuration, instead of dynamic setting modification prior to 5.x.
  • It takes effect after the cluster is restarted.

A request can be rejected for a variety of reasons

Similar to question 2, if the Elasticsearch cluster starts to reject index/write requests, there are a number of possible reasons.

Typically, this indicates that one or more nodes cannot keep up with the number of index/delete/update/bulk requests, resulting in queues building up and accumulating on the node.

Once the index queue exceeds the maximum value set for the queue (such as the value defined by ElasticSearch.yml or the default value), the node will start rejecting index requests.

Troubleshooting method: The state of the thread pool needs to be checked to see if index rejection always occurs on the same node or is distributed across all nodes.

GET /_cat/thread_pool? vCopy the code
  • If reject occurs only on specific data nodes, you can run into load balancing or sharding problems.
  • If reject is associated with high CPU utilization, it is usually the result of JVM garbage collection, which in turn is caused by configuration or query-related problems.
  • If you have a large number of sharding on the cluster, you may have an oversharding problem.
  • If a queue rejection is observed on a node, but the CPU is not saturated, disk write speed may be problematic.

7.3 Writing the BULK value needs to be further tuned

Don’t try to speed up writing, as a large decline invariably means writing reject.

Try indexing 100 documents at a time, then 200, then 400, and so on……

When indexing rates start to level off, you know you’ve reached the optimal size for bulk requests.

8, summary

Writing reject and 429 too many requests are common errors. The fault is mostly related to the size of the thread pool and queue. You need to troubleshoot the fault based on service scenarios.

This article provides a summary of thread pools and queues. Do you have similar problems in the field? Welcome to discuss and exchange comments.

reference

  1. drscg.tistory.com/640 
  2. Opster.com/elasticsear…
  3. Opster.com/elasticsear…
  4. www.elastic.co/guide/en/el…
  5. Opster.com/analysis/el…

Recommendation:

  1. All over the net! Elasticsearch Tutorial V1.0 was released with little fanfare
  2. Get out of The game, Get out of the Game
  3. How to write Elasticsearch
  4. Elasticsearch Performance Optimization guide
  5. Make Elasticsearch fly! — Performance optimization practice dry goods

The largest Elastic official account in China

Click here to get 10 hours of advanced ELK tutorials and learn how to hone your ELK skills daily with nearly 1,000 Elastic enthusiasts around the world (including 50% + Elastic certified engineers in China)!