This article does not cover the specific principles of ElasticSearch, but describes how to quickly import data into mysql for full text search.
Work need to implement a search function, and import an existing database data, the leader of the recommended use ElasticSearch, turn on the online tutorials, articles are relatively old, but only oneself fumble, reference ES document, is finally set up the service, record, hope can have the same demand friend little detours, Build a working ElasticSearch service quickly by following this tutorial.
ES of the structures,
ES can directly download ZIP files and docker containers. Comparatively speaking, Docker is more suitable for us to run ES service. It can be convenient to build a cluster or establish a test environment. We need a Dockerfile:
FROM docker.elastic.co/elasticsearch/elasticsearch-oss:6.0.0
# commit configuration Includes new elasticSearch.yml and Keystore. JKS file
COPY --chown=elasticsearch:elasticsearch conf/ /usr/share/elasticsearch/config/
# installation ik
RUN ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.0.0/elasticsearch-analysis-ik-6.0.0.zip
# installation readonlyrest
RUNThe. / bin/elasticsearch - plugin install https://github.com/HYY-yu/BezierCurveDemo/raw/master/readonlyrest-1.16.14_es6.0.0.zip
USER elasticsearch
CMD ./bin/elasticsearch
Copy the code
Here is a description of the above operation:
- First, create a conf folder in the same directory as Dockerfile to save elasticSearch. yml (shown later) and keystore.jks. (JKS is a self-signed file, used for HTTPS, how to generate please search)
- Ik is a popular Chinese thesaurus that supports Chinese searches.
- Readonlyrest is an open source ES plug-in for user management and security verification. The local rich can use the X-pack of ES to provide more perfect security functions.
Elactic configure elasticsearch. Yml
cluster.name: "docker-cluster"
network.host: 0.0. 0. 0
# minimum_master_nodes need to be explicitly set when bound on a public IP
# set to 1 to allow single node clusters
# Details: https://github.com/elastic/elasticsearch/pull/17288
discovery.zen.minimum_master_nodes: 1
Disallow the system to swap memory for ES
bootstrap.memory_lock: true
http.type: ssl_netty4
readonlyrest:
enable: true
ssl:
enable: true
keystore_file: "server.jks"
keystore_pass: server
key_pass: server
access_control_rules:
- name: "Block 1 - ROOT"
type: allow
groups: ["admin"]
- name: "User read only - paper"
groups: ["user"]
indices: ["paper*"]
actions: ["indices:data/read/*"]
users:
- username: root
auth_key_sha256: cb7c98bae153065db931980a13bd45ee3a77cb8f27a7dfee68f686377acc33f1
groups: ["admin"]
- username: xiaoming
auth_key: xiaoming:xiaoming
groups: ["user"]
Copy the code
Bootstrap. memory_lock: true is a pit that disables swapping memory. As documented here, some OSS swap temporarily unused memory to a region of the hard disk at runtime, but this behavior can cause ES resource usage to spike and even make the system unresponsive.
Mysql > select * from user where user = ‘root’; mysql > select * from user where user = ‘root’; mysql > select * from user where user = ‘root’; See the readonlyRest documentation for more details on configuration
Docker build-t ESImage:tag docker run -p 9200/9200 ESImage:tag
If https://127.0.0.1:9200/ returns
{
"name": "VaKwrIR"."cluster_name": "docker-cluster"."cluster_uuid": "YsYdOWKvRh2swz907s2m_w"."version": {
"number": "6.0.0"."build_hash": "8f0685b"."build_date": "The 2017-11-10 T18: them. 859 z"."build_snapshot": false."lucene_version": "7.0.1"."minimum_wire_compatibility_version": "5.6.0"."minimum_index_compatibility_version": "5.0.0"
},
"tagline": "You Know, for Search"
}
Copy the code
The main character of this tutorial is to share a few commonly used APIS for debugging ES:
Replace {{url}} with your local ES address.
- View all plugins: {{url}}/_cat/plugins? v
- {{url}}/_cat/indices? v
- Check ES health: {{url}}/_cat/health? v
- Check the current disk usage: {{url}}/_cat/allocation? v
Import MYSQL data
Here I use MYSQL data, in fact, other databases are the same, the key lies in how to import, online tutorials will recommend Logstash, Beat, ES MYSQL plug-in to import, I have also tested, cumbersome configuration, sparse documentation, if the database structure is more complex, import is a work of hard work. So it’s not recommended. In fact, ES has the corresponding API library in each language. You can assemble the data into JSON at the language level and send it to ES through the API library. The process is roughly as follows:
I’m using Golang’s ES Elastic library. You can search github for other languages in the same way.
Next, use a simple database to introduce:
Paper sheet
id | name |
---|---|
1 | Beijing First Primary School simulation volume |
2 | Jiangxi Beijing general college entrance examination |
Province table
id | name |
---|---|
1 | Beijing |
2 | jiangxi |
Paper_Province table
paper_id | province_id |
---|---|
1 | 1 |
2 | 1 |
2 | 2 |
As shown above, Paper and Province have a many-to-many relationship. Now, if Paper data is entered into ES, fuzzy search can be performed by Paper name or filtering can be performed by Province. The JSON data format is as follows:
{
"id":1."name": "Beijing No. 1 Primary School Simulation paper"."provinces":[
{
"id":1."name":"Beijing"}}]Copy the code
Start by preparing a mapping.json file, which is the definition of the data storage structure in ES.
{
"mappings": {"docs": {"include_in_all": false."properties": {"id": {"type":"long"
},
"name": {"type":"text"."analyzer":"ik_max_word"// use maximum word segmentation},"provinces": {"type":"nested"."properties": {"id": {"type":"integer"
},
"name": {"type":"text"."index":"false"// do not index}}}}}},"settings": {"number_of_shards":1."number_of_replicas":0}}Copy the code
Note that the _all field is disabled. The default _all field will collect all storage fields and implement unconditional search. The disadvantage is that the space is large.
I set the number of shards to 1, and I don’t set replicas. After all, this is not a cluster and not a lot of data is processed. If there is a large amount of data to be processed, you can set the number of shards and replicas.
The ca.crt is associated with JKS self-signature. Of course, here I use InsecureSkipVerify to ignore the certificate file validation.
func InitElasticSearch(a) {
pool := x509.NewCertPool()
crt, err0 := ioutil.ReadFile("conf/ca.crt")
iferr0 ! =nil {
cannotOpenES(err0, "read crt file err")
return
}
pool.AppendCertsFromPEM(crt)
tr := &http.Transport{
TLSClientConfig: &tls.Config{RootCAs: pool, InsecureSkipVerify: true},
}
httpClient := &http.Client{Transport: tr}
// Create elasticClient in the background
var err error
elasticClient, err = elastic.NewClient(elastic.SetURL(MyConfig.ElasticUrl),
elastic.SetErrorLog(GetLogger()),
elastic.SetGzip(true),
elastic.SetHttpClient(httpClient),
elastic.SetSniff(false), // Cluster sniffing, single node remember to turn off.
elastic.SetScheme("https"),
elastic.SetBasicAuth(MyConfig.ElasticUsername, MyConfig.ElasticPassword))
iferr ! =nil {
cannotOpenES(err, "search_client_error")
return
}
//elasticClient construction completed
// Check whether there is a paper index
exist, err := elasticClient.IndexExists(MyConfig.ElasticIndexName).Do(context.Background())
iferr ! =nil {
cannotOpenES(err, "exist_paper_index_check")
return
}
// If the index exists and passes the integrity check, no data is sent
if exist {
if! isIndexIntegrity(elasticClient) {// Delete current index Ready to rebuild
deleteResponse, err := elasticClient.DeleteIndex(MyConfig.ElasticIndexName).Do(context.Background())
iferr ! =nil| |! deleteResponse.Acknowledged { cannotOpenES(err,"delete_index_error")
return}}else {
return}}// Select * from elasticSearch
go fetchDBGetAllPaperAndSendToES()
}
Copy the code
type PaperSearch struct {
PaperId int64 `gorm:"primary_key; column:F_paper_id; type:BIGINT(20)" json:"id"`
Name string `gorm:"column:F_name; size:80" json:"name"`
Provinces []Province `gorm:"many2many:t_paper_province;" json:"provinces"` // The province where the test paper is used
}
func fetchDBGetAllPaperAndSendToES(a) {
//fetch paper
var allPaper []PaperSearch
GetDb().Table("t_papers").Find(&allPaper)
//province
for i := range allPaper {
var allPro []Province
GetDb().Table("t_provinces").Joins("INNER JOIN `t_paper_province` ON `t_paper_province`.`province_F_province_id` = `t_provinces`.`F_province_id`").
Where("t_paper_province.paper_F_paper_id = ?", allPaper[i].PaperId).Find(&allPro)
allPaper[i].Provinces = allPro
}
if len(allPaper) > 0 {
//send to es - create index
createService := GetElasticSearch().CreateIndex(MyConfig.ElasticIndexName)
// Index_default_setting is the same as mapping.json above.
createService.Body(index_default_setting)
createResult, err := createService.Do(context.Background())
iferr ! =nil {
cannotOpenES(err, "create_paper_index")
return
}
if! createResult.Acknowledged || ! createResult.ShardsAcknowledged { cannotOpenES(err,"create_paper_index_fail")}// - send all paper
bulkRequest := GetElasticSearch().Bulk()
for i := range allPaper {
indexReq := elastic.NewBulkIndexRequest().OpType("create").Index(MyConfig.ElasticIndexName).Type("docs").
Id(helper.Int64ToString(allPaper[i].PaperId)).
Doc(allPaper[i])
bulkRequest.Add(indexReq)
}
// Do sends the bulk requests to Elasticsearch
bulkResponse, err := bulkRequest.Do(context.Background())
iferr ! =nil {
cannotOpenES(err, "insert_docs_error")
return
}
// Bulk request actions get cleared
if len(bulkResponse.Created()) ! =len(allPaper) {
cannotOpenES(err, "insert_docs_nums_error")
return
}
//send success}}Copy the code
{{url}}/_cat/indices? V to see if the new index appears in ES, use {{url}}/papers/_search to see how many documents have been hit. If the number of documents is equal to the number of documents you sent, the search service is running.
search
Tests can now be searched by ProvinceID and Q, sorted by relevance score by default.
//q search string provinceID specifies the provinceID limit page paging parameters
func SearchPaper(q string, provinceId uint, limit int, page int) (list []PaperSearch, totalPage int, currentPage int, pageIsEnd int, returnErr error) {
// If the condition is not met, use database search
if! CanUseElasticSearch && ! MyConfig.UseElasticSearch {return SearchPaperLocal(q, courseId, gradeId, provinceId, paperTypeId, limit, page)
}
list = make([]PaperSimple, 0)
totalPage = 0
currentPage = page
pageIsEnd = 0
returnErr = nil
client := GetElasticSearch()
if client == nil {
return SearchPaperLocal(q, courseId, gradeId, provinceId, paperTypeId, limit, page)
}
//ElasticSearch has a problem, use database search
if! isIndexIntegrity(client) {return SearchPaperLocal(q, courseId, gradeId, provinceId, paperTypeId, limit, page)
}
if! client.IsRunning() { client.Start() }defer client.Stop()
q = html.EscapeString(q)
boolQuery := elastic.NewBoolQuery()
// Paper.name
matchQuery := elastic.NewMatchQuery("name", q)
/ / provinces
if provinceId > 0&& provinceId ! = DEFAULT_PROVINCE_ALL { proBool := elastic.NewBoolQuery() tpro := elastic.NewTermQuery("provinces.id", provinceId)
proNest := elastic.NewNestedQuery("provinces", proBool.Must(tpro))
boolQuery.Must(proNest)
}
boolQuery.Must(matchQuery)
for _, e := range termQuerys {
boolQuery.Must(e)
}
highligt := elastic.NewHighlight()
highligt.Field(ELASTIC_SEARCH_SEARCH_FIELD_NAME)
highligt.PreTags(ELASTIC_SEARCH_SEARCH_FIELD_TAG_START)
highligt.PostTags(ELASTIC_SEARCH_SEARCH_FIELD_TAG_END)
searchResult, err2 := client.Search(MyConfig.ElasticIndexName).
Highlight(highligt).
Query(boolQuery).
From((page - 1) * limit).
Size(limit).
Do(context.Background())
iferr2 ! =nil {
// Handle error
GetLogger().LogErr("Error while searching"+err2.Error(), "search_error")
// Handle error
returnErr = errors.New("Error while searching")}else {
if searchResult.Hits.TotalHits > 0 {
// Iterate through results
for _, hit := range searchResult.Hits.Hits {
var p PaperSearch
err := json.Unmarshal(*hit.Source, &p)
iferr ! =nil {
// Deserialization failed
GetLogger().LogErr("Error while searching"+err.Error(), "search_deserialization_error")
returnErr = errors.New("Error while searching")
return
}
if len(hit.Highlight[ELASTIC_SEARCH_SEARCH_FIELD_NAME]) > 0 {
p.Name = hit.Highlight[ELASTIC_SEARCH_SEARCH_FIELD_NAME][0]
}
list = append(list, p)
}
count := searchResult.TotalHits()
currentPage = page
if count > 0 {
totalPage = int(math.Ceil(float64(count) / float64(limit)))
}
if currentPage >= totalPage {
pageIsEnd = 1}}else {
// No hits}}return
}
Copy the code
Over