Quick implementation of full text search with ElasticSearch6.0

This article does not cover the specific principles of ElasticSearch, but describes how to quickly import data into mysql for full text search.

Work need to implement a search function, and import an existing database data, the leader of the recommended use ElasticSearch, turn on the online tutorials, articles are relatively old, but only oneself fumble, reference ES document, is finally set up the service, record, hope can have the same demand friend little detours, Build a working ElasticSearch service quickly by following this tutorial.

ES of the structures,

ES can directly download ZIP files and docker containers. Comparatively speaking, Docker is more suitable for us to run ES service. It can be convenient to build a cluster or establish a test environment. We need a Dockerfile:

FROM docker.elastic.co/elasticsearch/elasticsearch-oss:6.0.0
# commit configuration   Includes new elasticSearch.yml and   Keystore. JKS file
COPY --chown=elasticsearch:elasticsearch conf/ /usr/share/elasticsearch/config/
# installation ik
RUN ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.0.0/elasticsearch-analysis-ik-6.0.0.zip
# installation readonlyrest
RUNThe. / bin/elasticsearch - plugin install https://github.com/HYY-yu/BezierCurveDemo/raw/master/readonlyrest-1.16.14_es6.0.0.zip
USER elasticsearch
CMD ./bin/elasticsearch
Copy the code

Here is a description of the above operation:

First, create a conf folder in the same directory as Dockerfile to save elasticSearch. yml (shown later) and keystore.jks. (JKS is a self-signed file, used for HTTPS, how to generate please search)
Ik is a popular Chinese thesaurus that supports Chinese searches.
Readonlyrest is an open source ES plug-in for user management and security verification. The local rich can use the X-pack of ES to provide more perfect security functions.

Elactic configure elasticsearch. Yml

cluster.name: "docker-cluster"
network.host: 0.0. 0. 0

# minimum_master_nodes need to be explicitly set when bound on a public IP
# set to 1 to allow single node clusters
# Details: https://github.com/elastic/elasticsearch/pull/17288
discovery.zen.minimum_master_nodes: 1

Disallow the system to swap memory for ES
bootstrap.memory_lock: true

http.type: ssl_netty4

readonlyrest:
  enable: true
  ssl:
    enable: true
    keystore_file: "server.jks"
    keystore_pass: server
    key_pass: server

  access_control_rules:

    - name: "Block 1 - ROOT"
      type: allow
      groups: ["admin"]

    - name: "User read only - paper"
      groups: ["user"]
      indices: ["paper*"]
      actions: ["indices:data/read/*"]

  users:

    - username: root
      auth_key_sha256: cb7c98bae153065db931980a13bd45ee3a77cb8f27a7dfee68f686377acc33f1
      groups: ["admin"]

    - username: xiaoming
      auth_key: xiaoming:xiaoming
      groups: ["user"]
Copy the code

Bootstrap. memory_lock: true is a pit that disables swapping memory. As documented here, some OSS swap temporarily unused memory to a region of the hard disk at runtime, but this behavior can cause ES resource usage to spike and even make the system unresponsive.

Mysql > select * from user where user = ‘root’; mysql > select * from user where user = ‘root’; mysql > select * from user where user = ‘root’; See the readonlyRest documentation for more details on configuration

Docker build-t ESImage:tag docker run -p 9200/9200 ESImage:tag

If https://127.0.0.1:9200/ returns

{
    "name": "VaKwrIR"."cluster_name": "docker-cluster"."cluster_uuid": "YsYdOWKvRh2swz907s2m_w"."version": {
        "number": "6.0.0"."build_hash": "8f0685b"."build_date": "The 2017-11-10 T18: them. 859 z"."build_snapshot": false."lucene_version": "7.0.1"."minimum_wire_compatibility_version": "5.6.0"."minimum_index_compatibility_version": "5.0.0"
    },
    "tagline": "You Know, for Search"
}
Copy the code

The main character of this tutorial is to share a few commonly used APIS for debugging ES:

Replace {{url}} with your local ES address.

View all plugins: {{url}}/_cat/plugins? v
{{url}}/_cat/indices? v
Check ES health: {{url}}/_cat/health? v
Check the current disk usage: {{url}}/_cat/allocation? v

Import MYSQL data

Here I use MYSQL data, in fact, other databases are the same, the key lies in how to import, online tutorials will recommend Logstash, Beat, ES MYSQL plug-in to import, I have also tested, cumbersome configuration, sparse documentation, if the database structure is more complex, import is a work of hard work. So it’s not recommended. In fact, ES has the corresponding API library in each language. You can assemble the data into JSON at the language level and send it to ES through the API library. The process is roughly as follows:

I’m using Golang’s ES Elastic library. You can search github for other languages in the same way.

Next, use a simple database to introduce:

Paper sheet

id	name
1	Beijing First Primary School simulation volume
2	Jiangxi Beijing general college entrance examination

Province table

id	name
1	Beijing
2	jiangxi

Paper_Province table

paper_id	province_id
1	1
2	1
2	2

As shown above, Paper and Province have a many-to-many relationship. Now, if Paper data is entered into ES, fuzzy search can be performed by Paper name or filtering can be performed by Province. The JSON data format is as follows:

{
    "id":1."name": "Beijing No. 1 Primary School Simulation paper"."provinces":[
        {
            "id":1."name":"Beijing"}}]Copy the code

Start by preparing a mapping.json file, which is the definition of the data storage structure in ES.

{
    "mappings": {"docs": {"include_in_all": false."properties": {"id": {"type":"long"
                },
                "name": {"type":"text"."analyzer":"ik_max_word"// use maximum word segmentation},"provinces": {"type":"nested"."properties": {"id": {"type":"integer"
                        },
                        "name": {"type":"text"."index":"false"// do not index}}}}}},"settings": {"number_of_shards":1."number_of_replicas":0}}Copy the code

Note that the _all field is disabled. The default _all field will collect all storage fields and implement unconditional search. The disadvantage is that the space is large.

I set the number of shards to 1, and I don’t set replicas. After all, this is not a cluster and not a lot of data is processed. If there is a large amount of data to be processed, you can set the number of shards and replicas.

The ca.crt is associated with JKS self-signature. Of course, here I use InsecureSkipVerify to ignore the certificate file validation.

func InitElasticSearch(a) {
	pool := x509.NewCertPool()
	crt, err0 := ioutil.ReadFile("conf/ca.crt")
	iferr0 ! =nil {
		cannotOpenES(err0, "read crt file err")
		return
	}

	pool.AppendCertsFromPEM(crt)
	tr := &http.Transport{
		TLSClientConfig: &tls.Config{RootCAs: pool, InsecureSkipVerify: true},
	}
	httpClient := &http.Client{Transport: tr}

	// Create elasticClient in the background
	var err error
	elasticClient, err = elastic.NewClient(elastic.SetURL(MyConfig.ElasticUrl),
		elastic.SetErrorLog(GetLogger()),
		elastic.SetGzip(true),
		elastic.SetHttpClient(httpClient),
		elastic.SetSniff(false), // Cluster sniffing, single node remember to turn off.
		elastic.SetScheme("https"),
		elastic.SetBasicAuth(MyConfig.ElasticUsername, MyConfig.ElasticPassword))
	iferr ! =nil {
		cannotOpenES(err, "search_client_error")
		return
	}
	//elasticClient construction completed

	// Check whether there is a paper index
	exist, err := elasticClient.IndexExists(MyConfig.ElasticIndexName).Do(context.Background())
	iferr ! =nil {
		cannotOpenES(err, "exist_paper_index_check")
		return
	}

	// If the index exists and passes the integrity check, no data is sent
	if exist {
		if! isIndexIntegrity(elasticClient) {// Delete current index     Ready to rebuild
			deleteResponse, err := elasticClient.DeleteIndex(MyConfig.ElasticIndexName).Do(context.Background())
			iferr ! =nil| |! deleteResponse.Acknowledged { cannotOpenES(err,"delete_index_error")
				return}}else {
			return}}// Select * from elasticSearch
	go fetchDBGetAllPaperAndSendToES()
}
Copy the code

type PaperSearch struct {
	PaperId    int64     `gorm:"primary_key; column:F_paper_id; type:BIGINT(20)" json:"id"`
	Name       string    `gorm:"column:F_name; size:80" json:"name"`
	Provinces  []Province `gorm:"many2many:t_paper_province;" json:"provinces"`        // The province where the test paper is used
}

func fetchDBGetAllPaperAndSendToES(a) {
	//fetch paper
	var allPaper []PaperSearch

	GetDb().Table("t_papers").Find(&allPaper)

	//province
	for i := range allPaper {
		var allPro []Province
		GetDb().Table("t_provinces").Joins("INNER JOIN `t_paper_province` ON `t_paper_province`.`province_F_province_id` = `t_provinces`.`F_province_id`").
			Where("t_paper_province.paper_F_paper_id = ?", allPaper[i].PaperId).Find(&allPro)
		allPaper[i].Provinces = allPro
	}

	if len(allPaper) > 0 {
		//send to es - create index
		createService := GetElasticSearch().CreateIndex(MyConfig.ElasticIndexName)
		// Index_default_setting is the same as mapping.json above.
		createService.Body(index_default_setting)
		createResult, err := createService.Do(context.Background())
		iferr ! =nil {
			cannotOpenES(err, "create_paper_index")
			return
		}

		if! createResult.Acknowledged || ! createResult.ShardsAcknowledged { cannotOpenES(err,"create_paper_index_fail")}// - send all paper
		bulkRequest := GetElasticSearch().Bulk()

		for i := range allPaper {
			indexReq := elastic.NewBulkIndexRequest().OpType("create").Index(MyConfig.ElasticIndexName).Type("docs").
				Id(helper.Int64ToString(allPaper[i].PaperId)).
				Doc(allPaper[i])

			bulkRequest.Add(indexReq)
		}

		// Do sends the bulk requests to Elasticsearch
		bulkResponse, err := bulkRequest.Do(context.Background())
		iferr ! =nil {
			cannotOpenES(err, "insert_docs_error")
			return
		}

		// Bulk request actions get cleared
		if len(bulkResponse.Created()) ! =len(allPaper) {
			cannotOpenES(err, "insert_docs_nums_error")
			return
		}
		//send success}}Copy the code

{{url}}/_cat/indices? V to see if the new index appears in ES, use {{url}}/papers/_search to see how many documents have been hit. If the number of documents is equal to the number of documents you sent, the search service is running.

search

Tests can now be searched by ProvinceID and Q, sorted by relevance score by default.

//q search string provinceID specifies the provinceID limit page paging parameters
func SearchPaper(q string, provinceId uint, limit int, page int) (list []PaperSearch, totalPage int, currentPage int, pageIsEnd int, returnErr error) {
	// If the condition is not met, use database search
	if! CanUseElasticSearch && ! MyConfig.UseElasticSearch {return SearchPaperLocal(q, courseId, gradeId, provinceId, paperTypeId, limit, page)
	}

	list = make([]PaperSimple, 0)
	totalPage = 0
	currentPage = page
	pageIsEnd = 0
	returnErr = nil

	client := GetElasticSearch()
	if client == nil {
		return SearchPaperLocal(q, courseId, gradeId, provinceId, paperTypeId, limit, page)
	}

	//ElasticSearch has a problem, use database search
	if! isIndexIntegrity(client) {return SearchPaperLocal(q, courseId, gradeId, provinceId, paperTypeId, limit, page)
	}

	if! client.IsRunning() { client.Start() }defer client.Stop()

	q = html.EscapeString(q)
	boolQuery := elastic.NewBoolQuery()
	// Paper.name
	matchQuery := elastic.NewMatchQuery("name", q)

	/ / provinces
	if provinceId > 0&& provinceId ! = DEFAULT_PROVINCE_ALL { proBool := elastic.NewBoolQuery() tpro := elastic.NewTermQuery("provinces.id", provinceId)
		proNest := elastic.NewNestedQuery("provinces", proBool.Must(tpro))
		boolQuery.Must(proNest)
	}

	boolQuery.Must(matchQuery)

	for _, e := range termQuerys {
		boolQuery.Must(e)
	}

	highligt := elastic.NewHighlight()
	highligt.Field(ELASTIC_SEARCH_SEARCH_FIELD_NAME)
	highligt.PreTags(ELASTIC_SEARCH_SEARCH_FIELD_TAG_START)
	highligt.PostTags(ELASTIC_SEARCH_SEARCH_FIELD_TAG_END)
	searchResult, err2 := client.Search(MyConfig.ElasticIndexName).
		Highlight(highligt).
		Query(boolQuery).
		From((page - 1) * limit).
		Size(limit).
		Do(context.Background())

	iferr2 ! =nil {
		// Handle error
		GetLogger().LogErr("Error while searching"+err2.Error(), "search_error")
		// Handle error
		returnErr = errors.New("Error while searching")}else {
		if searchResult.Hits.TotalHits > 0 {
			// Iterate through results
			for _, hit := range searchResult.Hits.Hits {
				var p PaperSearch
				err := json.Unmarshal(*hit.Source, &p)
				iferr ! =nil {
					// Deserialization failed
					GetLogger().LogErr("Error while searching"+err.Error(), "search_deserialization_error")
					returnErr = errors.New("Error while searching")
					return
				}

				if len(hit.Highlight[ELASTIC_SEARCH_SEARCH_FIELD_NAME]) > 0 {
					p.Name = hit.Highlight[ELASTIC_SEARCH_SEARCH_FIELD_NAME][0]
				}

				list = append(list, p)
			}

			count := searchResult.TotalHits()

			currentPage = page
			if count > 0 {
				totalPage = int(math.Ceil(float64(count) / float64(limit)))
			}
			if currentPage >= totalPage {
				pageIsEnd = 1}}else {
			// No hits}}return
}
Copy the code

Over

Quick implementation of full text search with ElasticSearch6.0

ES of the structures,

Import MYSQL data

Paper sheet

Province table

Paper_Province table

search

Related Posts

I used Mac M1 to play with Spring Cloud | Java development

How can developers make RocketMQ work? 【Remoting 】

30 lines of code to achieve ant forest automatically steal energy