Design pattern is the experience summary and best practice that the predecessors summarize through a lot of practice. After many years of software development practices, return overdo to see 23 design patterns, you will find many usually write code routines and similar to summarize in OO patterns and design patterns, it also shows that you see things and people that, after a lot of practice can always tendency of come to the conclusion that some best practices. Architecture design is the same, here combined with their own understanding of the analysis of Microsoft’s cloud architecture given some patterns. Microsoft is really good at this, and the Microsoft Application Architecture Guide translated before is also very good. The advantage of the mode is that the dialogue between technical personnel and technical personnel can be easily communicated through several key words of the mode, just like the current communication mentioned the chain of responsibility mode, if both sides understand the meaning of this mode, then these five words may be replaced by half an hour of explanation. Without further ado, let’s look at patterns that are already familiar and familiar.

Management and monitoring

1. Ambassador Pattern: Create help services that send network requests on behalf of consumer services or applications



Out-of-process proxy services (as mentioned earlier in the introduction of middleware, many framework-level things can be hosted in the process in the form of a software framework, or as a separate proxy to do a network middleware). The ambassador mode here means a network proxy process that communicates with a remote service to do the following:

· Service Routing

· Service fuse

· Service tracking

· Service monitoring

· Service Authorization

· Data encryption

· Logging

Since it is a web service with a separate process, this model is suitable for us to have multiple languages and frameworks that all need to do the same thing, so a lot of the work on the client side of our framework can be moved to the ambassador service. Of course, there is an extra layer of overhead for network calls, and the deployment of ambassador services takes into account that performance may not necessarily be centrally deployed. These are all things to consider.

2. Anti-corruption pattern: Implement decoration or adapter layers between modern applications and legacy systems

Use an anti-corrosion layer as a middleman for communication between the old and new systems. In this way, the new system can completely use the new communication mode and architecture, while the old system can be retained temporarily without special modification, and the anti-corruption layer can be scrapped when the old system is not used. This mode is suitable for the transition scheme of the new and old system migration, and does not belong to the permanent architecture design mode.

External configuration storage: Move the configuration information in the application deployment package to a centralized location

This pattern states that there can be an external configuration service to hold configuration information, which I explained in detail in the fifth article on middleware. Whether in the perspective of management operation or convenience and security, independent configuration service with the characteristics of configuration sharing configuration external storage is essential for large websites. There are many open source projects that provide configuration services, as described in my previous article.

4. Gateway aggregation mode: multiple separate requests are aggregated into one request using the gateway

If an application needs to interact with multiple services, an aggregation gateway layer is built in the middle, and the gateway sends multiple requests to subsequent services concurrently, and then aggregates the data back to the application. This model has several benefits:

· Allow concurrent calls to multiple services to improve performance, allowing only partial data to be returned

· Some flexible design schemes can be made in the gateway (fusing, retry, current limiting)

· Some caching schemes can be done in the gateway

· Gateway can be used as a network middle layer for external network communication

Of course, using this pattern requires consideration of gateway load, high availability, high performance (asynchronous IO), and so on.

In fact, this mode is not only used for communication between pure back-end services, many front-end API requests will make an aggregation layer, so that the front-end can only send a request to the back-end to request multiple API returns at a time, reduce the number of network requests to improve performance.

The simplest implementation can be an OpenResty or Nginx implementation.

5. Gateway pressure relief mode: put shared or specific service functions into the gateway proxy

The name is a little hard to understand, but it’s a pattern we probably use all the time. Is to use a proxy gateway layer to do some business irrelevant and troublesome points, such as SSL, implementation using Nginx implementation is very simple. We often enable THE HTTPS service externally, and then actually provide the HTTP interface for the internal service, and perform protocol translation through the gateway.

6. Gateway routing pattern: Use a single endpoint to route requests to multiple services

This is also a common practice, our external interface may be/CART, /order, /search API, behind which are actually different services, forwarding through the gateway layer, not only can do load balancing and failover of back-end services, Flexible routing can also be carried out when the back-end service changes and switches the external API path (such as version upgrade), ensuring the consistency of the external interface. Can use Nginx to achieve, BELIEVE that most companies are by Nginx such a gateway to the external, will not directly resolve the domain name to the underlying services on the external.

7. Healthy endpoint monitoring mode: Functional checks are performed in the application and external tools can be accessed periodically through exposed endpoints

This pattern is actually quite important, and there are a few things to note:

· What information should be exposed? Is not only the service itself or the framework itself is started successfully, as exposed services rely on external storage or system is available, because the network communication is complex, seen from outside a service is available does not mean that our website can successfully connect, if the underlying database are unable to connect, even if the site itself launch successful, Then we should consider this service unhealthy. External storage even for A node is connected to the node B can not be connected is also possible, probably because the network problems or permissions, possibly because the load problem, sometimes for A long connection request because always attached to the nodes stored there will be no problem, when the new node B requires connection because beyond the maximum connection limit cannot be connected. If possible, expose the various thread pools, connection pools, and queues inside the service (object count, queue length, etc.). These are key metrics that are hard to perceive outside the application because they are inside the application. Having some key metrics exposed makes it easier to troubleshoot performance problems.

· Not only websites, but also services should expose health information, which we can collect externally for monitoring and aggregation, and our load balancer or publishing system needs a way to determine whether the service is available and restart or fail over if it is not.

· For external services, pay attention to the authorization of health port, where there may be some sensitive information, which should not be seen by anonymous users.

In implementation, we should integrate the Health port into the system as a plug-in that can be configured to enable, rather than each system developing its own. If SpringBoot is used, the Actuator module can be used.

The Strangler pattern: Progressively migrate old systems by gradually replacing specific features with new applications and services

It’s a scary name, but this pattern is how to do migration. Create a facade that acts as a route between old and new services on the back end, slowly replace the service with the new service, and finally delete the facade when all services are new. In this way, consumers do not perceive the process of migration. In the last article, we mentioned that the way to change the engine is to keep the original facade, but also through this facade to replace the underlying engine. In fact, I think it is perfectly reasonable to think of this mode of reducing the peripheral impact. The real difficult process is the process of data migration and the implementation of the underlying services mentioned above.

Performance and scalability

9. Cache assist mode: Load data from data store to cache on demand

This pattern is not about cache use in a broad sense, but rather about one of these uses. There are several ways to use the cache:

· Check the cache, there is no check, and then update the cache

· Maintain a large block of “full” data directly and try to synchronize with the database

This mode says the latter. It is the best for data that does not change much, achieving a hit ratio of almost 100%, and does not require cross-process communication if the data is not large enough to fit into the process. To carefully think, there is a layer of performance optimization point, because we are in memory to maintain a set of complex data structure, the full amount of data in memory object reference is just a pointer references, data in memory can quickly search, for the data volume is not big but relations of complex data, the search efficiency can be hundreds of database. Typically, data is fully added to memory when the application is started, and data is updated later with some strategies:

· Update synchronous data regularly. Different data can be updated by background threads at different update frequencies

· Data has different expiration times. After expiration, the data is updated actively or passively by way of callback triggered by request

· Synchronize data in cache and database after data modification

Separation of command and query responsibilities: Separate reading and updating of data by using a separate interface

The acronym is CQRS, a key word that you might be familiar with. What CQRS originally said was that we could have two sets of data models, one for reading and one for writing. The advantage is that we can have completely different data structures for reads and writes, reducing mutual interference and reducing the complexity of permission controls. I don’t necessarily mean architecturally we can do this, but internally we can have two sets of command models that handle reading and writing, optimized and customized.

It is now common to do something similar to the above, configuring two separate data sources for reads and writes and doing this in conjunction with the event traceability approach (see the next section). Let’s talk about the storage separation between the read-write model and the read-write model. This is what our architecture diagram actually means in the article “Complementary Storage Five-piece”. For reading and writing, we can set up a special materialized view for the read without a set of data sources. We can optimize the view for the read, avoid doing a lot of Join work during the read, and achieve the best performance (the materialized view mode will be introduced later). Event tracing +CQRS+ materialized view are usually used together.

Event traceability mode: Use appending only storage to record a complete set of events that describe the actions taken on the data in the domain

Event traceability (ES) is an interesting pattern that says we record not the current state of the data but the superimposed sequence of changes in the data. The traditional CRUD approach has performance concurrency limitations because of the update operation, and we need to have additional logs for auditing, otherwise we have lost information. The event tracing mode records events rather than the current state, so it has the following characteristics:

· Event immutable, just add new events, no conflict, high performance

· External processing is event-driven with low coupling

· Retain first-hand original information without loss of information

There are some business scenarios where this pattern is better than CRUD storage:

· The business values the intent and purpose of the data more than the current state, focusing on auditing, rollback, and history functions

· Want to avoid data update conflicts, want to produce data with high performance, but also accept the final consistency of data state

, the whole system is driven by events in itself (we can think about it in the real world, interaction between objects and objects, through the event, each object observed events from other objects to make its own reflection, this is the most natural, rather than the observed another object changes to adjust their own attribute)

On the other hand, systems with very simple business logic, systems that require strong consistency, and systems that rarely update data are not suitable for this pattern. What business scenarios do you know that adopt the ES model? Let’s talk about it.

Materialized view mode: Generates pre-populated views on the data in one or more data stores for the desired query operation when the data is not ideally formatted

When we use data stores, we tend to think more about storage than reading. We use various database paradigms to design the database. When we read the data, we need to do a lot of associated queries to output the required query results. Performance is often a bottleneck at this point, and materializing views is a matter of space for time. Rather than correlate at query time, save a copy of the query and output oriented data format ahead of time. Therefore, materialized views are suitable for the following scenarios:

· Data can be queried only after complex calculation

· Backside storage may be unstable

· You need to connect multiple different types of storage to query the results

However, due to the overhead of materialized view calculation and storage, it is not suitable for the situation where data changes too frequently. Because data processing takes time, it is not suitable for the scene where strong data consistency is required.

The implementation is typically decoupled from the main process by maintaining an additional set of materialized views based on message listeners. HP Vertica is a high-performance column analysis database. One of its features is materialized view, which greatly improves performance by providing SQL statements in advance to directly cache statistics-oriented query results. It is also a space-for-time idea.

Queue-based load balancing: Use a queue as a buffer between tasks and services to smooth the intermittent heavy load

Message queues are so familiar that we’ve improved them many times before, even if I call them one of the three architectural wagons. This model emphasizes the advantages of peak clipping here. Here I would like to mention a few other points:

· The introduction of message queues will not improve the processing capacity, but will reduce the performance, but we have decoupled the coupling to allow each component to have its own flexibility, for the part that cannot be loaded in the queue buffer, buffer does not mean storage does not mean unlimited

· The queue looks at the ratio of processing speed to queuing speed. Generally speaking, we need to do a pre-assessment to ensure that TPS are processed at more than 2 times the peak queuing TPS, and to ensure that half of the surplus is set aside so that TPS can withstand the pressure even if it drops by 30% when the business logic is modified

Priority queue mode: Determines the priority of requests sent to the service so that higher priority requests are received and processed more quickly



Unlike FIFO structured queues, priority queues allow messages to identify processing priorities. There are two ways to implement the above two diagrams:

· Message priority mode. Real-time position rearrangement in the queue, always prioritizing higher-level messages.

· Different treatment pools. We can set up special processing pools for different processing levels to handle these messages. High-level messages have more processing resources and better hardware to handle them, so they will inevitably have higher processing power.

In the scheme selection and implementation, it is necessary to consider whether the message priority needs to be treated by absolute priority or by relative priority. If absolute priority is needed, preemption is also needed in addition to message rearrangement. Also, if we take the second multi-pool approach, there is the possibility that low-level messages will be processed faster than high-level messages if the business logic is completely different.

Implementation on the RabbitMQ above 3.5 version supports the message priority, is the first way, the message with the accumulation of buffer time for rearrangement, consumers can see first to process the high priority messages, first this way in the consumption rate is greater than the output speed could not be achieved high level under the scenario of the message priority.

Add that to the message queue, there is a need of special consideration is still stuck in the queue of messages should be regarded as a low priority or dead letter to deal with, it is best to have to deal with individual consumers, avoid such news affect the processing of the entire queue, seen many accidents are caused by the message queue is jammed completely lost the ability of processing.

Flow limiting mode: Controls the resources consumed by an application, an individual tenant, or an instance of an entire service

When doing the pressure test, we will find that as the pressure rises, the throughput of the system gradually increases and the response time can be basically controlled (within 1 second). When the pressure breaks through a boundary, the response time will be uncontrollable suddenly, and then the throughput of the system will decline, and finally it will completely collapse. There is a limit to the stress load of any system, beyond which the SLA of the system will definitely not meet the standard, resulting in no good use of the service. Because the expansion of the system is often not seconds can be done, so the fastest method at this time is to limit the flow, only limited flow to protect the current system from breaking through the boundary of the complete collapse. For the large volume of business system activities, it is inevitable to limit the flow of key services and even the entrance level, there is no other way, Taobao Double 11 at 0 o ‘clock that moment can also see a certain proportion of the next order is restricted.

Common traffic limiting algorithms are as follows:

· Counter algorithm. In the simplest algorithm, resource usage is increased by one and release is decreased by one, reaching a certain count of denial of service.

· Token bucket algorithm. Add tokens to the bucket at a fixed rate, and the bucket holds a maximum of N tokens, which are filled and discarded. The request is rejected if the token is not obtained.

· Leaky bucket algorithm. A fixed capacity leak that drains water droplets at a certain rate (task). Water droplets (tasks) can flow in at any speed, and overflow and discard when full.

The token bucket algorithm limits the average inflow rate, allowing a certain degree of burst requests, while the leak bucket algorithm limits the constant outflow rate used to smooth the inflow rate. In terms of implementation, some common open source libraries will have relevant implementation, for example, RateLimiter provided by Google’s Guava is the token bucket algorithm.

Note the following for limiting traffic:

· Traffic limiting needs to be implemented quickly. Any request that exceeds the traffic control is not allowed to pass. Otherwise, it is meaningless.

· Traffic limiting should be carried out in advance, preferably when the system capacity reaches 80%. The later the traffic limiting is, the greater the risk will be.

· You can return a specific flow limiting error code to the client to let the user know that this is not an error but a flow limiting and can try again later.

· Because many places in our system will do flow limiting, we had better be sensitive to such flow limiting curves on the monitoring diagram. After flow limiting, the curve suddenly loses the gradient of growth and becomes a stable state. If the time range of the monitoring diagram is too small, it will misjudge that this is a normal amount of requests.

· Flow limiting can be done at edge nodes. Consider seconds kill scenes, if there are 1 million requests for a second, all the 1 million requests to our application server doesn’t make sense, we can be on the edge nodes (CDN) even do the edge of the simple calculation, let this 1 million requests with the method of destiny directly random give up 99.9% of them left 1000 requests, Eventually it can enter our business services, so that TPS in 1000 is generally no problem. So a lot of times when we participate in a seckill system, there’s an extreme amount of time when we don’t think about it and tell you that the event is over, that you’ve been chosen and you’re not destined to get into the back-end system to participate in the seckill.

In the next installment, we will continue to cover some of the architectural patterns for data, security, messaging, and resiliency.