This article has participated in the activity of “New person creation Ceremony”, and started the road of digging gold creation together
Hello everyone, I am Tong Yan Wuji. I am a city lion who does not do his duty. I believe that “practice makes real knowledge, life is simpler” and YEARN for freedom.
If you have some enlightenment and harvest, please give a thumbs-up and welcome to comment.
Continue with the previous topic
Last time, you saw how metadata works. Next, in order to achieve tens of millions of commodity data storage, as well as diversified query scenarios, how to design the storage layer is particularly critical.
Pain points
You can’t use MYSQL all the time. So for example,
MYSQL > select * from product where color is red, price is 100, category is clothes, region is Guangzhou, etc. Select * from T_goods where color = red and price =100 and category = clothing and region = guangzhou; If there are more search conditions, there are more and conditions.
Not to mention this SQL statement can be optimized, but so to execute, I believe that performance will not be able to carry. Some students said, that index, but not every field index ah. In addition, don’t forget that our premise is 10 million level of commodity data volume, MYSQL theoretically said that a single table can reach 10 million, but in practice, try to know that the query is slow.
Another classmate said that you can divide the database into tables. In view of the large amount of data, query slow situation, this is a method. Inevitably, however, increasing code complexity is not desirable.
MYSQL has waded to the end of the road, not hanging on a tree. Let’s broaden our minds.
Here’s how we play
The user queries the request to micro-service, and the micro-service assemes the query conditions to find the commodity ID list meeting the conditions in ElasticSearch database, and then queries the commodity details from Cassandra according to the commodity ID list.
Here, we use two databases, ElasticSearch and Cassandra. What is the reason for choosing them?
Why ElasticSearch?
If you mention ElasticSearch, the first thing you’ll notice is “search engine”. Have a bit like Baidu search, jingdong search the sort of. It’s not.
ElasticSearch can do a lot of things. There are several applications for ElasticSearch: log real-time analysis reports, search services, temporal data analysis.
My system choice is based on search ability. Let’s review the scenario of our system: we need to do full-text search, similar to the search of a large number of commodities of e-commerce companies like Pinduoduo and JINGdong. As for the advantages, there are also many. Here are the main ones:
- Excellent performance. The maximum QPS of a single service is 10W. We can’t search for goods in circles all the time.
- Recall rate, accuracy rate and other indicators can be evaluated and improved.
- The ecosystem is rich, the community is active, and you can find any resource you want.
Select * from ElasticSearch; select * from ElasticSearch;
The JAVA API is even more convenient.
Why Cassandra?
If you go online, you can find Cassandra. I’m going to use Cassandra, but more importantly, it has some unique features of MYSQL:
-
Very simple query statement, support primary key query. So when I search for product details, I take the product ID (primary key).
-
Final consistency. You can’t miss the numbers at the end of the day.
-
You can store a lot of data. I love that part.
-
You can always add columns. It is impossible for our business to think of all the fields in one fell swoop. As the version develops, it is necessary to add fields.
MYSQL Cassandra Once certain columns are defined for a table, all columns must be filled with at least one null value in each row when data is inserted You are free to add any column to any column family at any time A relational table defines only columns and populates the table with values Tables contain columns, or they can be defined as super column families -
The Java API is extremely simple.
With that, Cassandra was a no-brainer.
It’s not always perfect
MYSQL > create ElasticSearch/Cassandra/mysql.sql > create ElasticSearch/Cassandra/mysql.sql > create ElasticSearch/Cassandra
Indeed, there are always two repositories in order, so you need to find a way to ensure data consistency. Let’s discuss several scenarios and see which one is inconsistent, so we can get the right answer.
- [example] failed to write ElasticSearch, or failed to write Cassandra. There are no data inconsistencies in this scenario.
- A data inconsistency occurred between the two libraries when writing to Cassandra and ElasticSearch failed.
- A data inconsistency occurred between the two libraries as Cassandra failed to write to ElasticSearch.
For scenarios 2 and 3, the essence is the same: not on a logical library, so there is no guarantee of transactality.
In order to ensure the consistency of the final data, my solution is to record the commodity information that failed to write this time, and look forward to the next playback and re-execution. After all, few fail, and as long as the data is ultimately consistent, the system works.