You Know, for Search

Elasticsearch is a distributed, high performance, high availability, scalable search and analysis system. Elasticsearch is an open source search engine built on top of Apache Lucene™, a full-text search engine library. Lucene is arguably the most advanced, high-performance, full-featured search engine library available today — both open source and private. But Lucene is just a library. To get the most out of it, you need to use Java and integrate Lucene directly into your application. Unfortunately, you may need a degree in information retrieval to understand how it works. Lucene is very complex. Elasticsearch is also written in Java and uses Lucene internally for indexing and searching, but it is intended to make full text retrieval easy by hiding the complexity of Lucene and providing a simple and consistent RESTful API instead. Elasticsearch packages all the functionality into a single service, so you can communicate with its simple RESTful apis through programs, use your favorite programming language to act as a Web client, and even use the command line. As far as Elasticsearch is concerned, it’s easy to get started. For starters, it presets appropriate defaults and hides complex search theory. Out of the box. With minimal understanding, you can quickly become productive.

Lucene

A JAR package containing packaged code for building inverted indexes and searching, including algorithms. We use Java development, the introduction of Lucene JAR, and then based on Lucene API to carry out development can be. With Lucene, we can index existing data, and Lucene will organize the index data structure for us on the local disk. Lucene, the most advanced and powerful search library, is developed directly based on Lucene, very complex, complex API (implement some simple functions, write a lot of Java code), requires a deep understanding of the principles (various index structure) elasticSearch, based on Lucene, hidden complexity, (1) distributed document storage engine (2) Distributed search engine and analysis engine (3) distributed, support PB level data out of the box, excellent default parameters, do not need any additional Settings, completely open source

Inverted index

1. Create a document list:

2. Create an inverted index list: divide the data in the document into words to obtain terms. The entry is numbered, the entry is indexed, and finally saved.

3. Search process:

When the user input any term, the first user input data for word segmentation, get the user to search for all terms, and then take these terms to the inverted index list for matching. Finding these terms will give you the numbers of all the documents that contain them. Then follow these numbers to the document list to find the document

Elasticsearch core concepts

Near real-time NRT

Elasticsearch is a near real-time search platform. This means that there is a short delay (usually 1 second) between documents being indexed and actually being searchable.

Cluster Cluster

A cluster is a collection of one or more nodes (Servers) that holds all the data we hold and provides the ability to index and search across all nodes. Each cluster has a unique name identifier, which defaults to “ElasticSearch”. This name is very important. Each node can only be added to one cluster, and this name determines whether each node can be part of a cluster.

Node Node

A node is a single service that is part of a cluster and is responsible for storing data. The indexing and searching capabilities of the cluster require the participation of each node. Like a cluster, each node has its own name, which is randomly generated by UUID by default and assigned a name when the node is started. If you do not want to use the default generated name, you can customize it. These names are also important because they are used by the cluster to identify each node. In cluster management, it is also important to know which nodes belong to a cluster. The default node is added to a cluster named “ElasticSearch”. If you start a bunch of nodes, they will automatically form a elasticSearch cluster, but a node can also form a ElasticSearch cluster

The Index Index

An index is a collection of similar documents. How do you understand that? For example, you could have one index for consumer data, one for commodity information, and one for order data. Each index is identified by name (which must be lowercase). When we perform operations on document data in an index, such as indexing, searching, updating, and deleting, it is also identified by name. A cluster can define any number of indexes.

Type the Type

Note: Type is deprecated in Elasticsearch 6.0. No more support for multiple types Types are used to implement a logical classification/partition on the index, which allows us to store different types of documents in one index, for example, user type, blog type. However, after 6.0, the creation of multiple Type types in a single file is no longer supported, and the concept of Type types will be removed completely in the next release, 7.0.

Document is the Document

A document is the smallest unit of information that can be indexed by Elasticsearch. For example, you can have a document for a consumer, a document for an item, and a document for an order. Documents are presented in JSON format, which is a popular data format.

We can store as many documents as we want in a single index/type. Note that although physically documents are stored in an index, in the current version (6.5), a document must be stored with an index/type specified.

Sharps & Replica

Shard: A single machine cannot store a large amount of data. Es can split the data in an index into multiple Shards and store them on multiple servers. With the Shard, you can scale horizontally, store more data, distribute search and analysis operations across multiple servers, and improve throughput and performance. Each shard is a Lucene index

Replica: Any server may fail or break down at any time. In this case, shard may be lost. Therefore, multiple replica copies can be created for each shard. Replica can provide backup service when shard fails to ensure data is not lost. Multiple replicas can improve the throughput and performance of search operations. By default, each index has 10 shards, 5 primary shards, 5 replica shards, minimum high availability configuration. It’s two servers.

AKF is a scaleable model described in Architecture: The Future. The Scaleable Cube has three axes, each describing a dimension of Scalability: product, process, and team: X-axis - represents undifferentiated cloning of services and data, with work evenly distributed across different service instances; Y-axis -- Focus on the division of responsibilities in the application, such as data types, transaction execution types; Z-axis - Focuses on prioritization of services and data, such as geographic segmentation.Copy the code

ElasticSearch complete directory

Elasticsearch is the basic application of Elasticsearch.Elasticsearch Mapping is the basic application of Elasticsearch.Elasticsearch is the basic application of Elasticsearch Elasticsearch tF-IDF algorithm and advanced search 8.Elasticsearch ELK