This article is based on the case of zhao guoguang, senior architect of tuniu tourism and development leader of technical committee in 2016 TOP100summit. Editor: Cynthia

Introduction: The price center system is one of the most important systems in Tuniu, and almost all sales price calculations are generated by this system. In the early stage, due to the lack of reasonable design, it could not timely meet the needs of rapid business growth, and there were problems such as slow price calculation, system instability, and difficulty in adding functions. The system was under great pressure, so we started system optimization. After optimization, the stability of the system has been greatly improved. The performance in special scenarios has been improved by dozens of times, and the average performance has been improved by several times. This paper will be the way niu price center system optimization practice for in-depth sharing.

I. Characteristics of tourism products

Tourism products are the integration of all kinds of resources, including air tickets, hotels, tickets, visas and other services used in the process of travel. Take the five-day tour product of Sanya as an example. Fly to Sanya and check in the hotel on the first day, and then have a variety of fun. Fly away on the fifth day, which involves many kinds of services.

These scenarios need to be handled during the calculation:

● Each group of products needs to be arranged to select the lowest price of the resource composition; ● The price of the same resource may be different in different group periods (off-peak season, different suppliers offer different prices, resource inventory influence); ● We need to consider the number of consecutive days of hotel stay under the same trip, and select the cheapest hotel resources with the total price of multiple days of continuous stay; ● Some resources can only be calculated through a number of conditions to know the price (such as individual plane tickets); ● Some products are configured with many resources, such as thousands of hotels.

Therefore, it takes a lot of data and calculation to get the price of a product. In addition, the price of resources will change frequently, such as the price of air tickets; Or because a lot of inventory at a lower price is sold out and that resource can no longer be sold at a lower price, the price is going to go up. These conditions will cause the price to change. The frequency of such changes is very high. In addition, there are many resources, millions of times a day, and each resource will be used by multiple products. Therefore, each resource change will cause the price calculation of multiple products. These factors cause the price center to face two difficulties: complex calculation, large amount of calculation.

Refactoring practices

In the process of the whole system optimization and reconstruction, we have encountered many problems. It is these problems that have been well solved that make the system optimization succeed.

2.1 What does the system care about most?

What does the system care about most, response time? Throughput? Concurrency? Functional? Stability? These are all things we care about, but when you can’t have your cake and eat it, what do we care about more? This question relates to our technology selection and a deeper understanding of the system.

For example, if you have a system that requires a high response time, and a request requires an immediate response, then too much asynchronous processing is not a good candidate. Price centric systems are more focused on throughput, meaning that resource price changes must be reflected in product prices. In terms of response time, it is not mandatory to have milliseconds or seconds, in fact, it can be changed in a few seconds, because the probability of customers ordering products at the moment the price changes is very low. In some scenarios, the response time can be longer, such as overnight price brushing, so we can completely use asynchronous processing.

2.2 Where is the bottleneck of the system?

System performance optimization often falls into the THREE areas of CPU/IO/ memory. So where are our system bottlenecks? This problem determines where system optimization goes. Take the price center system for example, if the CPU is very low, I/O time is very long, the memory usage is very large, you can draw a conclusion: speed can not go up, time is consumed in THE IO, CPU can not be effectively utilized. Then come to the countermeasures: promote the number of concurrent threads, the CPU hit up. However, this will increase memory consumption, so it is not appropriate.

After analysis, there will be a question: so much DATA read from I/O occupies the memory, and the CPU is low, is there redundant data read? With this question, we checked again and found that it did exist. This is just an incidental discovery. After analysis, we basically determined the direction is to reduce IO and reduce the memory usage by controlling the calculation scale.

2.3 Abstract domain model

What problem does the abstract domain model solve?

● The original system of functional logic chaos, serious coupling; ● Lack of a unified understanding of how a function is implemented; ● Lack of a common language of communication between product and development.

Domain model condenses the elements of domain knowledge and is an indispensable member of system operation. However, domain model does not possess the knowledge of the overall operation of the system and is only responsible for the internal logic of the model.

The system should have at least a domain layer and an application layer. The domain models complete their logical implementation in the domain layer, and the application layer associates these domain models. We abstracted the system into large domain models such as resource, product, promotion, transportation, foundation, etc.

2.4 Division of microservices

Microservices are divided to solve the following problems: ● Functional coupling; ● Data coupling, that is, the data without partition and protection is dependent on multiple functional modules, resulting in a data change will affect a large area of functional modules; ● The scope of impact of the upgrade, because a function needs to expand instances will lead to the relocation of all instances, a small function caused by the online will be the entire application online.

The specific approach is: through abstraction, micro-services conforming to system characteristics are divided based on the previously established domain model, and each service should deal with a class of complexity of the system. Different microservices are different in logic function, called frequency, dependent data, number of instances and so on. Our system is refined into resource management services, product price management services, computing control services, promotional computing services, price inquiry services.

It is recommended to adopt the asynchronous message queue for the cooperation between microservices. In this way, services only care about their own processing logic, and do not care about who processes the data generated by them in which way. Services are not aware of each other.

Of course, this is due to the nature of the system, because the response time can be sacrificed slightly. It is also a good choice to invoke the service through RESTFUL interfaces. In this case, the service provider must provide a general service, that is, there is no judgment of “how to handle when the caller is XX” in its own logic.

2.5 Control the granularity of each calculation

Our system also faces a special problem: a tourism product is composed of multiple resources, such as hotels/tickets/air tickets, etc. As mentioned above, the price calculation of a product may have multiple resources and choices.

For example, a 5-day sanya tour product can depart from Beijing, Nanjing, Shanghai, Xi ‘an and other cities. The flights in these cities are different, and there are many flight routes to choose from. The number of flights is proportional to the number of cities that can go to Sanya. The number of such cities cannot be known by the program in advance, but is entirely determined by the product design of the business people.

The problem with this situation is that when calculating the price of a product, it is not easy to know the amount of calculation in advance. Sometimes it is small if there are few cities, and sometimes it is large if there are many cities to travel to.

This brings at least two problems: first, it is not easy to evaluate the system’s capability, because the granularity of each technology is unknown and there is no unified measurement standard; ● Second, there will be a great waste of memory.

After analysis, we believe that the product price from Beijing to Sanya and the product price from Shanghai to Sanya are not required to be generated at the same time, so they can be calculated separately. This reduces the granularity of each calculation to a relative atomic size. This greatly reduces memory usage and speeds up computation.

2.6 Seamless Upgrade

With each upgrade, consider how to make the design unobstructed and roll-back to minimize the risk. Generally, if the database structure changes are involved, you can choose double-write, and the old database will be invalidated after the stability of the new structure database.

But we ran into a problem: want to change the input rule, the way the configuration way from blue to green, because green means more flexible, more in line with the needs of the business, but the problem is the online has a lot of ways to blue data configuration, delete all the data again according to the green way configuration is impossible.

So how do you do smooth switching? Unless an algorithm is found to map the blue configuration to green, unfortunately there is no such method.

What we do is map both blue data configuration and green data configuration to a fixed model and treat them all as rules, so blue and green are the four rules. This allows old and new data to coexist and smooth upgrades.

Third, technical implementation

3.1 Extension Cube

With the increasing demand for computing resources and storage resources, on the other hand, Moore’s Law fails, and people cannot find computers with huge resources and appropriate prices to support their businesses, so they turn to a large number of cheap minicompus to work together to achieve the purpose of supercomputers. As computing requirements increase, the system performance increases almost linearly by simply increasing the number of minicomputers. Scalability is an important consideration for distributed systems.

In the expansion cube shown in the figure: ● X-axis represents the homogeneity of each node (same type, same function), and the processing capacity of the system can be increased as long as the number of nodes is increased; ● Y-axis represents the expansion based on different businesses; ● The Z-axis represents the expansion of different user types (priority, geography, etc.).

Currently our system is mainly based on X – and Y-axis extensions.

Our horizontal storage expansion method is divided into tables and libraries, but the problem of cross-library Join is easily brought about by the division of libraries. Based on the product ID as the key word of the sub-table, there is basically no scenario where prices need to be associated between products, so the problem of cross-library Join is basically avoided.

3.2 RESTFUL Component Invocation

On service governance we have developed a RESTFUL invocation component to solve the problem of service discovery/invocation management. Service providers register their identity with a registry through which service consumers get a list of services available.

3.3 Message Queue

We have used message queues in many scenarios. One of the benefits of message queues is decoupling. At the sending end, you just send a message to the queue, send and forget, and you don’t have to worry about who processes the message and how.

In addition, load balancing can be done on the consumer side without the sender being aware of it. This allows you to dynamically increase or decrease the number of consumers. Of course, doing so also basically requires the design of the business to be stateless and asynchronous.

Iv. Case Summary

4.1 Architecture is for business.

Architecture is judged by how well it supports the business, not by what technology it uses. Only the architecture and technology selection that best fits the characteristics of the business is good. So only a deep understanding of the business itself is good architecture.

The emergence of many open source middleware has reduced the number of functions that business people need to implement themselves (unless the open source project does not meet the requirements), and architects have shifted from design implementation to judgment of business requirements, so as to make architectural decisions and technology selection and application of patterns.

This case is a system optimization of 10 billion computations. In fact, the first topic that should be discussed is whether 10 billion computations are necessary. The fact that we do reduce the amount of computation and the size of a single computation through optimization is based on an understanding of the business. So architecture always starts with the business.

4.2 The boundary between services should be clear and the coupling should be as small as possible.

The idea of DDD can help us find service boundaries.

4.3 Make the design as asynchronous as possible, so that the coupling is small and the logic is clearer.

Asynchronous systems are much more stable and have messaging middleware buffering to help in the event of a system bottleneck. Synchronous calls indirectly affect concurrency and throughput while waiting (threads block). In addition, asynchronous systems are easier to upgrade, there are fewer restrictions between systems, and they can be backlogged in queues.

4.4 Stateless saves a lot of load balancing troubles and does not need to do stickiness.

You can easily scale horizontally. Stateful systems are painful to scale horizontally.

4.5 Small iteration, low risk rollback.

We managed to roll back every time we went live, and the database was designed to be backward compatible for a period of time after going live. And old and new data can coexist.

November 9-12, Beijing National Convention Center, the 6th TOP100 Global Software Case Study Summit, Liu Xiaotao: Tuniu R&D Director will share “The world’s only Fast – Micro service Practice fast Response to the rapidly changing Market”.

TOP100 Global Software Case Study Summit has been held for six times, selecting excellent global software development cases and attracting 2000 attendees annually. Including product, team, architecture, operation and maintenance, big data, artificial intelligence and other technical special sessions, on-site learning of the latest RESEARCH and development practices of Google, Microsoft, Tencent, Alibaba, Baidu and other front-line Internet enterprises. Opening ceremony single object ticket application entry: www.top100summit.com/?qd=juejin