preface

Memcached is an open source, high-performance, distributed memory object caching system that can be used in any scenario that requires caching. Memcached is one of the most popular Memcached interview questions in the world. Memcached is one of the most popular interview questions in the world. Memcached is one of the most popular interview questions in the world.

Java Oop, Java Collections containers, Java exceptions, concurrent programming, Java reflection, Java serialization, JVM, Redis, Spring MVC, MyBatis, MySQL database, messaging middleware MQ, Dubbo, Linux, ZooKeeper, distributed & data structure and algorithm, etc. 25 thematic technical points, are all small editor in each big factory summary of the interview real questions, there have been many fans with this PDF to win many big factory offer. Today, here is a summary to share to everyone! [Finished]

The full version of the Java interview questions address: 2021 latest interview questions collection collection.

The serial number project content link
1 The middleware Java Middleware (2021) Juejin. Cn/post / 694870…
2 Micro service Java Microservices (2021) Juejin. Cn/post / 694906…
3 Concurrent programming Concurrent Programming in Java (2021 latest Edition) Juejin. Cn/post / 695053…
4 Java based Java Basics (2021) Juejin. Cn/post / 695062…
5 Spring Boot Spring Boot Interview Questions (2021 Latest edition) Juejin. Cn/post / 695137…
6 Redis Redis Interview Questions (2021 Latest edition) Juejin. Cn/post / 695166…
7 Spring MVC Spring MVC (2021) Juejin. Cn/post / 695166…
8 Spring Cloud Spring Cloud Interview Questions (2021) Juejin. Cn/post / 695245…
9 MySQL optimization MySQL optimize interview questions (2021 latest edition) Juejin. Cn/post / 695246…
10 JVM JVM Performance Tuning Interview questions (2021 Latest Edition) Juejin. Cn/post / 695246…
11 Linux Linux Interview Questions (2021 latest edition) Juejin. Cn/post / 695287…
12 Mybatis Mybatis (2021 latest Edition) Juejin. Cn/post / 695287…
13 Network programming TCP, UDP, Socket, Http Network programming interview (2021 latest edition) Juejin. Cn/post / 695287…
14 Design patterns Design Mode Interview Questions (2021 Latest edition) Juejin. Cn/post / 695544…
15 Big data 100 Big Data Interview Questions (2021 latest edition) Juejin. Cn/post / 695544…
16 Tomcat Tomcat Interview Questions (2021 Latest edition) Juejin. Cn/post / 695570…
17 multithreading Multithreaded Interview Questions (2021 Latest edition) Juejin. Cn/editor/draf…
18 Nginx Nginx_BIO_NIO_AIO interview Questions (2021 Latest edition) Juejin. Cn/editor/draf…
19 memcache Memcache Interview Questions (2021 latest edition) Juejin. Cn/post / 695608…
20 Java exception Java Exception Interview Questions (2021 Latest edition) Juejin. Cn/post / 695644…
21 The Java virtual machine Java Virtual Machine Interview (2021 latest edition) Juejin. Cn/post / 695658…
22 Java collection Java Set Interview Questions (2021 Latest edition) Juejin. Cn/post / 695684…
23 Git Git Git Command (2021) Juejin. Cn/post / 695692…
24 Elasticsearch Elasticsearch (2021 Latest Edition) Juejin. Cn/post / 695840…
25 Dubbo Dubbo Interview Questions (2021 Latest edition) Juejin. Cn/post / 695842…

How many shards do you have in your es cluster? How many shards do you have in your es cluster?

Interviewer: I want to know the application scenario and scale of ES that the applicant contacted with before, and whether he has done large-scale index design, planning and tuning.

Answer: truthfully combined with their own practice scenarios can be answered.

For example, the ES cluster architecture has 13 nodes, and the index is 20+ index according to channel. The index is increased by 20+ index according to date, and the index is 10 fragments, and the index is increased by 100 million + data every day. The index size of each channel is controlled within 150GB.

1.1. Optimization in the design stage

(1) Create indexes based on date templates and roll over API according to incremental service requirements;

(2) Use alias for index management;

(3) Perform force_merge operations on indexes at dawn every day to release space.

(4) Adopt cold and hot separation mechanism to store hot data on SSD to improve retrieval efficiency; Cold data is periodically shrink to reduce storage;

(5) life cycle management of index is adopted.

(6) Set the word segmentation reasonably only for the fields requiring word segmentation;

(7) In the Mapping stage, attributes of each field are fully combined to determine whether retrieval and storage are needed.

1.2. Write tuning

(1) The number of copies before writing is set to 0;

(2) Before writing, disable refresh_interval to -1 and refresh mechanism;

(3) In the writing process, bulk writing is adopted;

(4) Restore the number of copies and refresh interval after writing;

(5) Use automatically generated ids whenever possible.

1.3. Query tuning

(1) Disable wildcard.

(2) Disable batch terms (hundreds of scenarios);

(3) Make full use of the inverted index mechanism to keyword as much as possible;

(4) When the amount of data is large, the index can be determined based on time before retrieval;

(5) Set a reasonable routing mechanism.

1.4. Other tuning

Deployment tuning, business tuning, etc. As part of the above, the interviewer will have a general assessment of your previous practice or operations experience.

What is the inverted index of ElasticSearch

The data structure that Lucene has used extensively since version 4+ is FST. FST has two advantages:

(1) Small space occupation. By reusing the prefixes and suffixes of words in the dictionary, the storage space is reduced.

(2) Fast query speed. O(len(STR)) query time complexity.

Select * from elasticSearch; select * from elasticSearch

Interviewer: I want to know the operation and maintenance ability of large data volume.

Answer: Index data planning, should do a good job in the early planning, is the so-called “design first, coding after”, so as to effectively avoid the sudden data surge caused by the cluster processing capacity insufficient online customer search or other business affected.

How to tune, as mentioned in Question 1, is detailed here:

3.1 Dynamic index Level

3.2 Storage Layer

  • Hot data (for example, data generated in the latest three days or one week) is stored separately, and other data is stored separately.
  • If cold data is not written to new data, you can periodically perform force_merge plus shrink compression to save storage space and search efficiency.

3.3 Deployment Layer

Once there is no planning, this is a contingency strategy.

Combined with the dynamic expansion feature of ES itself, dynamic new machines can relieve the cluster pressure. Note: If the master node and other planning is reasonable, dynamic new machines can be completed without restarting the cluster.

How does ElasticSearch implement master voting

1GET /_cat/nodes? v&h=ip,port,heapPercent,heapMax,id,name 2ip port heapPercent heapMax id nameCopy the code

5, Describe the process of Elasticsearch indexing documents in detail

How about Elasticsearch?

Interviewer: You want to understand the underlying principles of ES search, not just the business level.

Answer:

The search is decomposed into two phases: “Query then Fetch”.

The purpose of the Query phase is to locate the position without fetching it.

How to optimize Linux Settings for Elasticsearch deployment

8. What is the internal structure of Lucence?

Interviewer: I want to know the breadth and depth of your knowledge.

Answer:

Lucene is an index and search process, including index creation, index, and search. You can build on that a little bit.

How does Elasticsearch implement Master voting?

10, 10 of the Elasticsearch nodes (say 20)

11. How do clients select specific nodes to execute requests when connecting to the cluster?

The TransportClient uses the Transport module to remotely connect to an ElasticSearch cluster. It does not join the cluster, but simply obtains one or more initialized transport addresses and communicates with them in a polling manner.

Describe the process of indexing documents for Elasticsearch.

Elasticsearch is a distributed RESTful search and data analysis engine.

(1) Queries: Elasticsearch allows you to perform and merge multiple types of searches — structured, unstructured, geographic, metric — in any way you want.

(2) Analysis: It is one thing to find the ten documents that best match the query. But what if you’re dealing with a billion lines of logs? Elasticsearch aggregation allows you to think big and explore trends and patterns in your data.

(3) Speed: Elasticsearch is fast. Really, really fast.

(4) Scalability: it can run on laptop computers. It can also run on hundreds of servers that host petabytes of data.

(5) Elasticity: Elasticsearch runs in a distributed environment and has been designed with this in mind since the beginning.

(6) Flexibility: Multiple case scenarios. Number, text, location, structured, unstructured. All data types are welcome.

(7) HADOOP & SPARK: Elasticsearch + HADOOP

Elasticsearch is a highly scalable open source full text search and analysis engine. It allows you to store, search, and analyze large amounts of data quickly and in near real time.

Here are some use cases for Elasticsearch:

(1) You run an online store and you allow your customers to search for the products you sell. In this case, you can use Elasticsearch to store the entire product catalog and inventory and provide search and auto-complete suggestions for them.

(2) You want to collect log or transaction data, and you want to analyze and mine that data for trends, statistics, summaries, or anomalies. In this case, you can use Loghide (part of Elasticsearch/ Loghide /Kibana stack) to collect, aggregate, and parse data, and then have Loghide input this data into Elasticsearch. Once the data is in Elasticsearch, you can run searches and aggregations to mine any information you’re interested in.

(3) You run a price alert platform that allows price-savvy customers to specify the following rule: “I am interested in purchasing specific electronic devices and would like to be notified if any vendor’s product is priced below $X in the next month.” In this case, you can grab the vendor’s price bars, push them into Elasticsearch, and use its reverse Percolator feature to match price movements with customer queries, eventually sending an alert to the customer when a match is found.

(4) You have analytical/business intelligence needs and want to quickly investigate, analyze, visualize, and ask special questions about large amounts of data (think millions or billions of records). In this case, you can use Elasticsearch to store the data, and then use Kibana(part of Elasticsearch/ Loghide /Kibana stack) to build custom dashboards to visualize the various aspects of the data that are important to you. In addition, you can perform complex business intelligence queries on data using Elasticsearch aggregation capabilities.

Describe how Elasticsearch updates and deletes documents.

(1) Delete and update are write operations, but Elasticsearch documents are immutable and cannot be deleted or changed to show changes.

(2) Each segment on disk has a corresponding.del file. When the delete request is sent, the document is not actually deleted, but is marked as deleted in the.del file. The document will still match the query, but will be filtered out of the results. When segments are merged, documents marked as deleted in the. Del file will not be written to the new segment.

(3) When a new document is created, Elasticsearch assigns a version number to that document. When the update is performed, the old document is marked as deleted in the.del file and the new document is indexed to a new segment. Older versions of documents still match the query, but are filtered out of the results.

16, Describe the process of Elasticsearch.

In Elasticsearch, how do I find an inverted index based on a word?

(1) Lucene’s indexing process is the process of writing inverted list in this file format according to the basic process of full-text retrieval.

(2) Lucene’s search process is to read out the information indexed in accordance with this file format, and then calculate the score of each document.

18, What are the optimizations for Linux Settings when Elasticsearch is deployed?

(1) MACHINES with 64 GB of ram are ideal, but 32 GB and 16 GB machines are also common. Less than 8 GB is counterproductive.

(2) If you have to choose between faster CPUs and more cores, more cores is better. The extra concurrency provided by multiple cores far outweighs a slightly faster clock rate.

(3) If you can afford SSD, it will go far beyond any rotating media. Ssd-based nodes have improved query and index performance. SSDS are a good choice if you can afford them.

(4) Avoid clustering across multiple data centers, even if they are close by. Clustering across large geographical distances is definitely avoided.

(5) Make sure that the JVM running your application is exactly the same as the server’s JVM. In several places in Elasticsearch, Java’s native serialization is used.

(6) Setting gateway.recover_after_nodes, gateway.expected_nodes, and gateway.recover_after_time can avoid excessive fragment exchanges when the cluster restarts. This could reduce data recovery from hours to seconds.

(7) Elasticsearch is configured to use unicast discovery by default to prevent nodes from unintentionally joining the cluster. Only nodes running on the same machine automatically form a cluster. It is best to use unicast instead of multicast.

(8) Do not arbitrarily change the size of the garbage collector (CMS) and individual thread pools.

(9) Give Lucene (less than) half of your memory (but no more than 32 GB!) , set by the ES_HEAP_SIZE environment variable.

(10) Swapping memory to disk is fatal to server performance. If memory is swapped to disk, a 100 microsecond operation can become 10 milliseconds. And think about all those 10 microseconds of operating delays that add up. It’s not hard to see how awful performance considerations are.

(11) Lucene uses a large number of files. Elasticsearch also uses a lot of sockets to communicate between nodes and HTTP clients. All of this requires sufficient file descriptors. You should increase your file descriptor and set it to a large value, such as 64,000.

Added: Index phase performance enhancement method

(1) Use bulk requests and resize them: 5-15 MB per batch is a good starting point.

(2) Storage: USE SSD

(3) Segments and merge: The default value for Elasticsearch is 20 MB/s, which should be a good setting for mechanical disks. If you’re using SSD, consider going up to 100-200 MB/s. If you’re doing bulk imports and don’t care about search at all, you can turn merge limiting off completely. You can also increase the index.translog.flflush_threshold_size setting from the default of 512 MB to a larger value, such as 1 GB, which accumulates larger segments in the transaction log during a flush trigger.

(4) If your search results do not require near-real-time accuracy, consider changing the index.refresh_interval for each index to 30s.

(5) If you are doing bulk imports, consider turning off replicas by setting index.number_of_replicas: 0.

19. What do you need to know about using Elasticsearch for GC?

(1) The index of inverted dictionary needs to be resident in memory and cannot be GC, so the growth trend of Segmentmemory on data node needs to be monitored.

All caches, fifield cache, fifilter cache, indexing cache, bulk queue, etc., should be set to a reasonable size and the heap should be sufficient in the worst-case scenario, i.e., when all caches are full. Is there heap space available for other tasks? Avoid using clear Cache to free memory.

(3) Avoid search and aggregation that return a large number of result sets. The Scan & Scroll API can be used for scenarios that require a large amount of data pulling.

(4) Cluster STATS resides in memory and cannot be expanded horizontally. The super-large cluster can be divided into multiple clusters to be connected through the tribe Node.

(5) To know whether the heap is sufficient, we must combine the actual application scenarios and continuously monitor the heap usage of the cluster.

(6) Understand the memory requirements according to the monitoring data, and reasonably configure all kinds of circuit breaker to minimize the risk of memory overflow

20, How to implement Elasticsearch aggregation for large data (tens of millions of magnitude)?

21, What does Elasticsearch do to ensure read-write consistency under concurrent conditions?

(1) Optimistic concurrency control can be used by version number to ensure that the new version will not be overwritten by the old version, and the application layer handles specific conflicts;

(2) In addition, for write operations, the consistency level supports quorum/ One /all, which defaults to quorum, i.e. write operations are allowed only when most shards are available. But even if most are available, there may be a failure to write to the copy due to network reasons, so that the copy is considered faulty and the shard is rebuilt on a different node.

(3) For read operations, you can set Replication to sync(the default), so that the operation is returned only after both master and replica sharding is complete; If Replication is set to ASYNc, you can also query the master shard by setting the search request parameter _preference to primary to ensure that the document is the latest version.

22. How do I monitor the Elasticsearch cluster status?

Marvel makes it easy to monitor Elasticsearch via Kibana. You can view your cluster health and performance in real time, as well as analyze past cluster, index, and node metrics.

Introduce the overall technical architecture of your e-commerce search.

24. Tell me about your personalized search solution.

25, Do you know dictionary tree?

The common dictionary data structure is as follows:

The core idea of Trie is to use the common prefix of the string to reduce the cost of query time to improve efficiency. It has three basic properties:

1) The root node contains no characters. Each node except the root node contains only one character.

2) From the root node to a node, the characters on the path are connected to the string corresponding to the node.

3) All children of each node contain different characters.

26. How is spelling correction implemented?