Liver Explosion caused by Q&A Service (1)

The whole process is presented to you in the way of question and answer, in order to try to be easy to understand, try to describe in plain English, writing is not easy, children’s shoes lightly spray.

Directory:

Why do QA QA services?
What are the difficulties?
MongoDb and Mysql and Elasticsearch selection comparison?
Elasticsearch Usage scenarios, features, and data types
How to land Elasticsearch in project? The next chapter is to be continued.

1. Why do QA q&A services?

The eIP-CM bonus system needs to add a function: Q&A, edit the title and answer in the background, and then search all relevant articles through the title in the small program end. I am secretly happy when I receive the demand. Just to the team as I finally can exhibit the CRUD strength of strong, confident grabbed our present, working hours, and the paper 2 hour, when my mind automatically generated CRUD code, the boss a brainwave, this Q&A to make separate service general, calls between various systems, meet various dynamic form of service, what? It’s a public service, right? Ten thousand hearts worship (bu) (fu).

What are the difficulties?

Without further ado, many of the services in the project are used for Q&A (similar to flexible forms), so we want to separate it out and make it a public micro-service. Since we want to make it universal, we have the following pain points:

With full text search, not only can match the word segmentation function of small program end search, but also to meet the background fuzzy matching and accurate matching.
To be generic, fields must be guaranteed to be flexible and the field hierarchy unknown.
Field nesting how to store, how to do query.
Whether a plain relational database can solve the requirement.
Not just QA services, but later content services like this can be used.
Code word code tired, temporarily slowly. To be continued.

Three, technology selection?

1) To search is not simple, to feel the inner monologue of several databases?

Mysql:
Search:
- Like does blur, match does full-text index.
Field flexibility:
- I admit I’m a relational database. Sorry to bother you
MongoDb:
Search, field flexibility:
- I have all the search functions of Mysql, and I am a Nosql database, which can store and query flexible fields and nested relationships, and support, aggregation, natural distributed, transaction.
Word segmentation:
- Sorry to bother you.
Elasticsearch:
- People malicious words are not much, in addition to business, you have I have, I choose me.
- I heard the most sincere Elasticsearch soliloquy and decided to check it out because I wanted to use the ELK log suite later.

Elasticsearch usage scenarios, features, and data types

What does Elasticsearch do?

Distributed real-time file storage where every field is indexed and searchable.
Distributed real-time analysis search engine, near real-time second response to massive data.
Simple restful API, naturally compatible with multi-language development.
Easy to expand, handling PB level structured or unstructured data.
Lucene: ElasticSearch is a jar package for ElasticSearch that provides Lucene functionality. It does not include Lucene’s complex logic, but it also provides distributed, Restful, and other features.

How does Elasticsearch work?

Site search (e-commerce, recruitment, portal, etc.)
GitHub searches hundreds of billions of lines of code
BI systems, Business intelligence, Business
Log Data Analysis
Commodity price monitoring website

The characteristics of Elasticsearch

Can be used as a large distributed cluster (hundreds of servers) technology, processing PB level data, serving large companies; It can also run on a single machine and serve small companies
Elasticsearch is not a new technology. It’s a combination of full text search, data analysis, and distributed technology that makes ES unique. Lucene (full text search), commercial data analysis software (also available)
For users, it is out of the box, very simple, as a small and medium-sized application, directly deploy ES for a few minutes, can be used as a production environment system, the amount of data is not too complex operation
Database capabilities are inadequate in many areas (transactions, as well as various online transactional operations); Special functions, such as full text search, synonym processing, relevance ranking, complex data analysis, near real-time processing of massive data; Elasticsearch is a complement to traditional databases and provides a lot of functionality that a database does not

What are the main available field data types for Elasticsearch?

String data types, including text, which supports full-text retrieval, and keyword, which matches precisely.
Numeric data types, such as byte, short integer, long integer, floating point, double, half_float, SCALed_float.
Date type, Date nanoseconds Date nanoseconds, Boolean values, binary (Base64 encoded string), etc.
Range (integer range integer_range, long range long_range, doubLE_range, float_range, date range date_range)
Complex data types containing objects, nested, Object.
GEO Indicates the type of a location.
Specific types such as arrays (values in arrays should have the same data type)

Analyze the advantages and disadvantages of several databases again:

The feature of ES, as the name suggests, is search. Strictly speaking, ES is not a database, but a search engine, and all aspects of ES are designed around search. ES support full-text search, simple explain what is the full text search: here for “I work in a Internet company in Beijing” such data, if you search for “Beijing”, “Internet” and “work” these keywords can hit this article data, this is the full text search, in your everyday use baidu and Google are in full text search. It is worth mentioning that the full-text search of ES also has good support for Chinese (there are many kinds of Chinese word segmentation alone), which can definitely meet the full-text search needs of most people in China. In addition to searching, ES will automatically index all fields for you to achieve high performance complex aggregate query, so as long as the data stored in ES, no matter how complex the aggregate query can get good performance, and you no longer have to build a variety of complex indexes headache.
Having said so many advantages of ES, do you think ES is omnipotent?
Unfortunately, ES has a number of drawbacks, most notably immutable field types, low write performance, and high hardware resource consumption. As mentioned earlier, ES will automatically build indexes for you. Although this can bring many benefits to full-text search and aggregate queries and save you the hassle of indexing, it also brings a number of problems. ES needs to set up a Mapping before creating a field. The Mapping contains the type information of each field. ES needs to set up appropriate indexes for the field based on the Mapping. Because of this Mapping, the fields in ES cannot be typed once they are created. For example, what if you want to temporarily add a field in a table that has been created and already contains a lot of data? Sorry, you have to delete the entire data table and rebuild it again! Therefore, ES is much more flexible than MySQL but far less flexible than MongoDB in data structure. The disadvantages of ES are not limited to these. The write performance of ES is also affected by automatic indexing, which is significantly lower than that of MongoDB. ES takes up significantly more storage space than MongoDB for the same data. , the consumption of hardware resources is also very severe, under the large amount of data 64G memory +SSD is the basic standard, can be regarded as the noble service in the database!
The full-text search feature of ES makes it a great tool for building search engines. In addition, ES supports complex aggregated queries well, which makes ES very suitable for data analysis. In fact, ES has also specially made its own matching ELK suite, which provides you with one-stop services from log collection to data visualization analysis, which is definitely a powerful tool to build a lofty data analysis platform. However, the disadvantages of ES, such as high cost and low write performance, make it unsuitable for those scenarios where data value is not high, write performance is required, and data volume is large but cost is limited.

How did Elasticsearch land in the project?

To be continued