Previously on
Two years ago today, I wrote the three Lives and three Times of Order management and House of Flying Daggeds. Two years later, how has this architecture developed, what new challenges and gains have been met? Today, I mainly review the evolution of AKF architecture of Order search with Praise.
1.1 Part always ends
Previously, the data scattered in multiple DB fragments was aggregated in ES, which brought great benefits, less synchronization tasks and low maintenance costs. This piece, especially orders migration before because of the design is divided, so when order trigger migration, need to insert data into a new subdivision, also need to delete old fragmented data, when confirmed process trival fallible, unified furled, for ES, each side orders migration, is just an update operation, very simple. Supplementary introduction to order migration:
- Buyer order migration migrates from mock buyerId to real assigned buyerId orders for new users to follow users.
- Seller order migration For store model upgrading, such as from micro mall to retail chain, original stores need to migrate orders independently.
Second, new challenges
However, with the continuous development of the business, the aggregated index began to expose various problems.
- The volume of data is growing much faster than expected, with hundreds of millions of index levels and slow search beginning to appear, like a giant ready to move.
- In order to meet some of the business personality search needs, a lot of search needs are very few will be queried, but will be added to the same master index, so that the winner index field is increasing.
Third, to deal with
3.1 A long time will be divided
In order to solve the above challenges, we embarked on the path of extensible architecture splitting. A brief introduction to the following dimensions of the “like” order search:
- B-side merchant single store search (merchants manage single store orders)
- B-end shop head store cross-branch search (chain head store manages branch orders)
- C-end buyer cross-store search (buyer manages all orders across stores)
Since both ToB and ToC are required, the introduction of b-end retail chain stores adds a lot of complexity, because there are head MU stores to manage multiple BU units, which need to be queried across multiple stores. No matter how sharding, a single dimension is bound to have cross-sharding search scenarios. The plan prioritizes the separation of hot and cold data, and how to distinguish and define this hot and cold data? The last day, January, a period of time of search, are relatively sparse, lack of data support.
Never forget, there will be echoes.
3.1.1 Hot Status Index
So I took a look at our surveillance, and I found a curious pattern. In all search scenarios, the common search by payment method, logistics type, commodity name, order type and so on accounts for a small proportion, while the search by order status accounts for the largest proportion, about 53%, that is, more than half of the search traffic is from order status search.
In the detailed order status search of 53%, about 3% of the final state orders (completed, closed) are searched, and 50% of all traffic is hot state orders (to be paid, to be shipped, to be organized, to be received, to be shipped). -_- Ignores the messy enumeration, and the statistics of multiple versions in history are integrated.
It’s exciting. Why? No matter how the order volume surges, the number of orders in the hot state will not continue to surge, because all orders will gradually flow to the final state, such as 30 minutes overdue, the order from the pending payment to the closed state, such as 7 days after the order shipped, the order from the shipped state to completed state. According to the statistics, the total number of hot state orders is tens of millions, and the flow has been relatively stable.
That is to say, we use this tens of millions of small index to undertake the entire order search about half of the traffic. Whether statistics, head store query, a variety of cross-slice dimension query, can be supported. Because it is a complete set of hot state order data, including all sharding scenarios, it is very exciting. The index has been running smoothly online for nearly a year.
3.1.2 Time Slice Index
So for the final state of the order, the data quantity as the order status will become more and more big, how to extend, time slicing is a good choice, have a great order search is the earliest do segmentation according to the order time slicing, small amount of business data and before every six months a, to the later development changed every three months, By now even one index per month was a bit big. Specifically, it should be combined with the search scene. Theoretically, the amount of final order retrieval is relatively small, so we can change our thinking to provide guidance from the product level. For example, by default, only the orders of the last six months are displayed, which is another idea.
3.2 Extension Basis
3.2.1 AKF extension cube
This cube has been mentioned repeatedly in “Architecture is The Future” and “The True Book of Architecture”. Combined with our actual situation, it really benefits us and gives us a guiding methodology.
X-axis: Focus on horizontal data and service clones, such as primary and secondary clusters, where data is replicated exactly the same. Clone multiple system (plus machine) load balancing allocation requests.
- Advantages: lowest cost, simple implementation
- Cons: Slow service response when a product is too large
- Scenario: In the early stage of development, the service complexity is low and the system capacity needs to be increased
Y-axis: Focus on the division of responsibilities within the application, such as the separation of data business dimensions. For example, transaction library, commodity library, membership library split.
- Advantages: Fault isolation, improved response time, better focus
- Disadvantages: Relatively high cost
- Scenario: Complex business, large amount of data, high degree of code coupling, large team size
Z-axis: Focus on prioritizing services and data, and splitting data user dimensions. Such as the common shards of data shards by user dimension.
- Advantages: Reduces the risk of failure, and the impact range is controllable, providing greater scalability
- Cons: Highest cost
- Scenario: Users grow exponentially
Of course, the essence of AKF expandable cube is not to expand in the direction of one axis all the time, but to achieve targeted expansion according to different business scenarios and data scale. In theory, XYZ axis can achieve infinite expansion to some extent. At present, the overall index architecture of the like order search is as follows, covering three axes.
3.3 the status quo
Four, harvest
The above briefly introduces the way to search AKF expansion by placing a “like” order. The following is a brief talk about some unexpected gains in the process, which benefit a lot and can be a reference for students in similar business.
4.1 Scalability Index field design
The reason why WE migrated to ES before is because of the multi-index retrieval ability of ES. However, the ever-changing product demand will make the index bigger and bigger through the mode of constantly adding fields, which is difficult to maintain. Is there an extensible way to cope with the ever-changing demand with constant or small changes? The answer is yes, list< String > field design, for example, currently open search extension points to Youzan cloud, merchants can customize their own search field, K and V are controlled by the merchants themselves, how to make the code configurable, business code is not aware of? Enter the fields to be retrieved in list< k_v > format according to our convention, and you can do it. More on this in the configurable order search blog series on detail order management.
4.2 Lightweight Statistics
Statistics has been the companies of the more important one, there are great too, almost have orders can see various orders statistics, statistical early scene is simple, such as statistical momentum, has to deliver goods, a refund orders etc can all be a SQL or a script task can be calculated, but with a great business development faster and faster, Such as statistical a join guarantee transaction + + has been completed 7 days refund orders, general statistical models by changing the SQL statistics, brush again offline data is can do, but the cycles tend to be long, and inflexible, once part of the statistical error of failure, screening problem is difficult, only to recount all quantity. Here, we adopt another perspective, using search to do statistics, relying on the total returned by ES search as the default value of statistics, can seamlessly use existing data to do any statistics of any combination of any dimension, demand at any time, ready-use, very lightweight. The details will also be covered in the configuration order statistics blog of the order Management series.
Five, the outlook
Back with order management four years of journey, fruitful, configuration change orders search, configuration of order statistics, order synchronous series post configuration can also be issued (configuration order export post issued), has now graduated from order management, follow-up is mainly responsible for a great search our lines of business, we sincerely invite have growth, Big data thinking and business sensitivity of the students to join, build a like search zhongtai Daye, resume direct mail [email protected]