In this article, you will learn what ElasticSearch is and why you need it, how to install and deploy ElasticSearch on your computer, and how to deploy a cluster of ElasticSearch instances on your computer.

What is ElasticSearch?

ElasticSearch is a Lucene based search server that provides a distributed multi-user full-text search engine based on a RESTful Web interface. ElasticSearch, developed in Java and released as open source under the Apache license, is a popular enterprise-level search engine. Wikipedia, Stack Overflow, Github search is based on ElasticSearch. It is designed for cloud computing. It is stable, reliable, fast, easy to install and use.

ElasticSearch is an open source, near real-time distributed storage, search, and analysis engine.

ElasticSearch has two main features: search and aggregation (for example, the list of top 10 mask sellers in the last 7 days), distributed storage and cluster management when the volume of data grows.

ElasticSearch comes from Lucene, so here’s a quick look at Lucene:

Lucene is a JAR package that contains all the code for building inverted indexes and searching, including algorithms. We use Java development, the introduction of Lucene JAR, and then based on Lucene API to carry out development can be. Using Lucene to index existing data, Lucene will organize the data structure of the index on the local disk. Alternatively, we can use some of the functionality and apis provided by Lucene to search for index data on disk.

Lucene also has many limitations, such as only Java language development, class library interface learning curve is steep, native does not support horizontal extension, etc. ElasticSearch solves this problem by supporting distributed, horizontal scaling, and reducing the learning curve of full text search. It can be called by any programming language.

Why ElasticSearch?

Use database, can also realize the function of search, why also need a search engine? Let’s take a look at what happens if you use a database to search:

If you search for items in electric business platform, each item has a record in a database, each record in the specified field in the text, can be long, such as the length of the commodity description field, has for thousands, even tens of thousands of characters, this time, every time to scan the text for each record, to judge the package does not contain I specify the keywords, For example, if we search for “face mask”, it will be very slow.

There’s also no way to break down the search terms. Try to search for as many results as you want, such as “medical mask”, and you won’t get “medical mask”.

However, on Github based on ElasticSearch, if we search for “design mode”, we will also get “Design mode” :

Therefore, using the database to achieve search, is not very reliable, performance will be relatively poor.

ElasticSearch is a distributed search engine, so let’s look at the distributed architecture of ElasticSearch.

ElasticSearch Distributed architecture

ElasticSearch was built for high availability and extensibility, and as you can see from the figure, ElasticSearch is easy to scale horizontally, as well as build a development environment on a PC. ElasticSearch can scale from a single node to hundreds of nodes when the data size is large. ElasticSearch also supports different node types and a Hot & Warm cluster deployment for logging applications.

This can be done by buying more powerful servers, called scaling up or vertically, or by adding more servers, called scaling out or horizontally.

ElasticSearch is a Java language development platform that has been installed in the native version of ElasticSearch. The Java development environment has been built into ElasticSearch since version 7.0.

Let’s install ElasticSearch.

Install and configure ElasticSearch

Official website to download address: https://www.elastic.co/downloads/ElasticSearch

Download elasticSearch-7.1.0-Windows-x86_64. zip and unzip it. Download elasticSearch-7.1.0-Windows-x86_64.

ElasticSearch file directory structure before running ElasticSearch, let’s take a peek at the ElasticSearch file directory structure:

File directory structure

The decompressed directory structure is shown in the figure above. The bin directory contains script files. The config directory contains the ElasticSearch configuration file. ElasticSearch. Yml is the main configuration file to be configured. The Java Runtime environment (JDK) directory was created after ElasticSearch 7.0. The data directory actually contains ElasticSearch data files; The lib directory contains Java class libraries; The logs directory contains all ElasticSearch log files. The modules directory contains all ES modules; ElasticSearch can be extended by plugins, so plugins contain all installed plugins.

There is a jvm.options file in the config directory, which is the configuration file for the JVM. The default Xms and Xmx in 7.1 are both 1GB.

It is recommended that Xms and Xmx be set to the same, that is, the maximum and minimum memory, Xmx should not exceed 50% of the machine’s memory, and the total memory should not exceed 30GB.

Next let’s start ElasticSearch.

Run a single instance of ElasticSearch

Go to the bin directory, open the cli, and enter ElasticSearch -e node.name=node0 -e cluster.name=wupx -e path.data=node0_data. ElasticSearch is an instance of ElasticSearch that you can use to create your own version of ElasticSearch.

ElasticSearch is now up and running on your computer by typing http://localhost:9200.

{ "name" : "node0", "cluster_name" : "wupx", "cluster_uuid" : "1TT8NYjcSxmLKeG-1ukqfA", "version" : { "number" : "" build_flavor 7.1.0", ":" default ", "build_type" : "zip", "build_hash" : "606 a173," "build_date" : "2019-05-16t00:43:15.323135z ", "build_snapshot" : false, "lucene_version" : "Minimum_index_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1"}, "tagline" : "You Know, for Search"}Copy the code

Name indicates the node name, cluster_name indicates the cluster name (the default cluster name is ElasticSearch), and version.number: 7.1.0 indicates the ElasticSearch version number.

Let’s take a look at how to install the ElasticSearch plug-in locally.

Install and view plug-ins

Enter ElasticSearch -plugin list in CMD to view the installed elasticSearch plug-ins.

Enter elasticSearch -plugin install analysis-icu to download the plugin.

After the installation is successful, start ElasticSearch and visit http://localhost:9200/_cat/plugins to see that the plug-in is successfully installed on the cluster.

How do I run multiple Instances of ElasticSearch on a development machine? One of the features of ElasticSearch is that it can be run in a distributed manner, that is, it can run multiple instances on multiple machines to form a cluster. To understand the inner workings of ElasticSearch, let’s take a look at how it works.

Run multiple Instances of ElasticSearch

You can run four instances of ElasticSearch in the background by entering the following code in the CMD to start each instance with the same node name, the same cluster name, and a different location for storing data.

elasticsearch -E node.name=node0 -E cluster.name=wupx -E path.data=node0_data -d
elasticsearch -E node.name=node1 -E cluster.name=wupx -E path.data=node1_data -d 
elasticsearch -E node.name=node2 -E cluster.name=wupx -E path.data=node2_data -d 
elasticsearch -E node.name=node3 -E cluster.name=wupx -E path.data=node3_data -dCopy the code

You can visit http://localhost:9200/_cat/nodes to check the nodes in the cluster.

conclusion

Now that you have a basic understanding of ElasticSearch, you should be able to run an instance of ElasticSearch locally and install any plugins you need. We’ve also seen how to run clusters of multiple Instances of ElasticSearch on a native machine, which will help us understand how distributed clusters of ElasticSearch work in the future.

reference

ElasticSearch in Depth

Elasticsearch technical Tutorial

Elasticsearch Top Master series

Elasticsearch core technology and actual combat

https://www.elastic.co/cn/what-is/elasticsearch