Summary: System design and architecture theory is a very broad question to answer, and basically all technical theories can be covered. As a backend technician who has been coding for nearly 10 years, I would like to make a brief comment.

Original sword tao technology on July 15

System design and architecture are closely related to the business type of the system. For example, the traditional business system mainly focuses on domain modeling design, high concurrency, high availability, data consistency and other systems, which are quite different from the business system in design. Therefore, for different types of systems, To briefly introduce the design of the time faced with some difficulties and solutions.

Background Domain model is the key to conventional business system design

The key of business system design lies in how to define the model of the system and the relationship between the models, which is mainly the definition of domain model. When the model is determined, the relationship between the models will also be clear.

Model Design can refer to the classic book of Domain model “Domain-Driven Design”, through which you can basically have a clear understanding of Domain definition, preservative layer, anemia model and other concepts.

Domain model systems within a single application also need to pay attention to domain layering. As developers, have you seen and reconfigured many controller-service-DAO style code layering designs? It often makes you vomit blood when doing reconstructions.

Design a better field design here is a layering suggestion:

Bradley j. 2008

Mainly responsible for interaction and communication with external systems, such as some Dubbo services, Restful apis, RMI, etc. This layer mainly includes Facade, DTO and some Assembler.

J. Bradley j. Bradley j. Bradley

The main component of this layer is the Service Service, but it is important to note that this layer of services is not simply a wrapper around the DAO layer. In the domain-driven architecture, the Service layer is a very thin layer that does not implement any logic inside. It is simply responsible for coordinating and forwarding, delegating business actions to the domain layer lower down.

J. J. Domain

The Domain layer is the core of the Domain model system and is responsible for maintaining the object-oriented Domain model. Almost all business logic is implemented in this layer. It contains Entity, ValueObject, Domain Event, Repository and other important Domain components.

J. J. Infrastructure

It provides support for Interfaces, Applications, and Domains. All platform-specific, frame-specific implementations are provided in the Infrastructure, preventing the three layers, especially the Domain layer, from “contaminating” the Domain model by inserting these implementations. The most common type of Infrastructure is a concrete implementation of object persistence.

High concurrency system design

How would you redesign your system if the traffic on your system increased by a factor of N? This high concurrency problem can be solved at various levels, such as

Bradley Bradley 2008 Code

  • Lock optimization (using lock-free data structures), mainly in the context of AQS locks under concurrent packages
  • Database cache design (reduce the pressure of database concurrency contention), here there will be cache, DB data inconsistency problem, in practice, high concurrency system and data consistency system adopt the strategy will be completely opposite.
  • When data is updated, data can be merged at the application layer. The same Container has only one DB update request at a time.
  • Others include space swap time based on BloomFilter, reduced processing time through asynchrony, concurrent execution through multiple threads, and so on.

2008 2008 Database

  • Different types of storage are selected according to different storage requirements, from the early RDBMS, to NoSql (KV storage, document database, full-text indexing engine, etc.), to the latest NewSql (TiDB, Google Spanner /F1 DB) and so on.
  • Table data structure design, field type selection and distinction.
  • Index design, need to pay attention to the clustering index principle and cover index elimination sort, as for the most left matching principle is rotten street common sense, some advanced index elimination sort mechanism and so on, B+ tree and B tree difference.
  • Last regular means: depots table, reading and writing separation, data fragmentation, hot data resolution, etc., high concurrency barrel tend to do data points, and there to go deep said and there are many, such as how to initialize the bucket, the last stage routing rules, how to combine data and so on, more classic way is divided into a bucket to a primary barrels + N points.

J. Bradley “The Architectural design aspect.

  • Distributed systems are servitized
  • Stateless supports horizontal elastic expansion and shrinkage
  • Failfast at the service logic layer fails rapidly
  • The link hotspot data is invoked
  • Multi-level cache design
  • Capacity planning in advance and so on

High availability system design

For very high availability requirements of the system, we generally say a number of 9 availability, such as 99.999%.

In the face of high availability system design can also be analyzed from various aspects

Code level: You need to focus on distributed transaction issues, and CAP theory is a regular line of interviews

At the software level: Stateless applications are supported. Multiple modules are deployed and the request processing results are consistent in any module. => Modules do not store context information and only process the request based on the parameters carried in the request. The purpose is for fast scaling and service redundancy. Common problems such as session problems.

2008 Load balancing problem

How to ensure system load after multiple software packages are deployed? How to choose the calling machine? That’s load balancing

  • In the narrow sense, load balancing can be divided into the following types:
  1. Hardware load: such as F5, etc
  2. Software load: such as LVS, Ngnix, HaProxy, DNS, etc.
  3. Of course, there is load balancing on code algorithms, such as Random, RoundRobin, ConsistentHash, weighted rotation, and so on
  • In a broad sense, load balancing can be understood as load balancing capabilities. For example, a load balancing system needs the following four capabilities:
  1. Faulty machines are found automatically
  2. Automatic fault service removal (service fuse)
  3. Request automatic retry
  4. Automatic discovery of service recovery

J. Bradley. Idempotent design problems

When load balancing is mentioned above, generalized load balancing requires automatic retry mechanism, so in business, we must ensure idempotent design.

There are two aspects to consider:

  • Request level

    Since the request will be retried, it must be idempotent to ensure that the result of repeating the request is exactly the same as that of executing it once. Idempotent design at request level requires idempotent at data modification layer, that is, data access layer read request natural idempotent, write request need to be idempotent. Read requests are naturally idempotent and return the same result no matter how many times a query is made. The essence of this is actually a distributed transaction problem, which is described in more detail below.

  • Business level

    This can cause serious problems such as multiple rewards and double orders. Business level idempotent is essentially a distributed lock problem, which I’ll cover later. How to guarantee not to repeat orders? Here, for example, token mechanism and so on. How to ensure that goods are not oversold? Such as optimistic lock. MQ consumers how to ensure power and so on are common interview questions.

J. 2008

Idempotent design at the business level is essentially a distributed lock problem. What is distributed lock? The global unique resource of the lock in the distributed environment serializes the request, shows the muexes and solves the idempotent problem of the business layer.

The common solution is setNX method based on Redis cache, but as a technician, it should be clear that there are still single point problems, unable to renew the lease based on timeout, asynchronous master-slave synchronization, etc. Further, CAP theory, an AP system cannot realize an AP requirement in essence. Not even RedLock.

So how do we design a distributed lock? Strong consistency and high availability of the service itself are the most basic requirements, others such as support for automatic renewal, automatic release mechanism, highly abstract access simple, visual, manageable, etc.

Reliable storage layer-based solutions such as:

  • zookeeper

    CP/ZAB/N+1 available: based on temporary node implementation and Watch mechanism.

  • ETCD

    CP or AP/Raft/N+1 available: Restful API; KV storage, strong consistency, high availability, data reliability: persistence; In Client TTL mode, the unique authentication UUID of the heartbeat CAS is required.

We have a meltdown for J. S. Service

After microsertization, the system is distributed and communicates with each other through RPC. The probability of failure of the whole system increases with the increase of the system scale. A small fault may cause a bigger fault through the link conduction. You want to choose to shield as much as possible from the degradation of quality of service in some non-critical path services when invoking the service.

Most fuses return the default value null and can also be customized. RPCClient native support is the best. The business side should change the code less (where the fuses are cut off). You can see the status of the service, whether it is degraded, whether it is fuses, and can deliver threshold configurations in real time.

Police Officer downgrades

If the overall service load exceeds the preset upper limit or the upcoming traffic is expected to exceed the threshold, to ensure the normal operation of important or basic services, reject some requests or delay or suspend some unimportant and non-urgent services or tasks.

The main measures are as follows:

  • Service layer degradation, the main means
  1. Reject some requests (traffic limiting), for example, cache request queues and reject some requests with long waiting time. Reject non-core requests according to Head; There are other common algorithms for limiting traffic such as token buckets, leaky buckets, etc.
  2. Close some services: for example, the reverse refund service will be closed at 0 o ‘clock on Double 11.
  3. Hierarchical degradation: For example, in autonomous service degradation, the number of downstream requests from the gateway, service, and DB is gradually reduced based on interception and service rules, indicating that the processing capability is gradually reduced from top to bottom.
  • Data layer degradation
  • For example, when traffic is high, update requests are only cached to MQ, read requests are only cached to read, and when traffic is low, complete operations are performed (if the general data access layer has been degraded, there is no need to do so in the data layer).
  • Flexible availability strategy
  • For example, some traffic limiting tools that specify the maximum traffic or that limit traffic according to the CPU load must be automatically enabled and do not depend on manual operation.

Usability issues arising from the way we publish

Publishing mode is also a point affecting high availability, ha ha, I have experienced some cases of online direct downtime publishing (bank internal system), but as a high profile Internet, mainly adopt these publishing modes: gray publishing, blue-green publishing, canary publishing and so on.

Data consistency system design

In general, some financial and accounting systems will be very strict on this part of the requirements, the following mainly introduces the transaction consistency involved in this, consistency algorithm and other content.

J. Bradley j. Bradley issues with consistency

At the DB level, rigid transactions are generally used to achieve data consistency, mainly through WAL(Write Ahead Logging). All changes to data files must be written to the log, so that even if a crash occurs during data writing, it can be recovered through the log file. Traditional database transactions are based on this mechanism (REDO committed transaction data is also rolled back UNDO uncommitted transaction).

In addition to this method, there is also a shadow data block for data backup, which records the status of the modified data block before modification in advance and backs it up. If rollback is needed, it is good to directly cover the backup data block.

The rest is an XA model based on two-phase commit.

However, the distributed deployment mode has been widely adopted in the Internet system at present, and the traditional rigid transaction cannot be realized. Therefore, flexible transaction has become the mainstream distributed transaction solution and prevention. The main modes are as follows:

  • TCC mode/or phase 2 mode

    Resources are pre-deducted in the try phase (but not locked to improve availability), and data is committed or rolled back in the Confirm or Cancel phase. Typically, you need to introduce a coordinator, or transaction manager.

  • SAGA mode

    Each participant in a business process commits a local transaction, and when one of the participants fails, the previous successful participant is compensated, supporting forward or backward compensation.

  • Transaction messages for MQ

    After processing halfMsg, MQ periodically asks the producer whether halfMsg can commit or rollback to achieve the transaction consistency. It actually delegates the action of compensation to RocketMQ.

  • Segmented things (asynchronous assurance)

    Based on reliable message + local transaction message table + message queue retry mechanism. At present, this is also the mainstream scheme of some big factories, which is generally called segmented things.

Flexible transactions are basically implemented based on final consistency, so there must be compensation actions in it, and the soft state is generally presented to the user until the final consistency is achieved.

To note is that not all of the system is fit for the introduction of data consistency framework, such as the user can be modified at any time by request, for example, merchants set up the background system, merchants can modify the data at any time, here, if involves the consistency, introducing the consistency framework leads to eventual consistency of compensating actions before, Resource locks block subsequent requests from the user. Resulting in a poor experience. In this case, other means are needed to ensure data consistency, such as data reconciliation and other operations.

2008 Consistency algorithm

From the early Paxos algorithm to the derivative ZAB protocol (see: A Simple Totally Ordered Broadcast Protocol), it provides A reliable distributed lock solution at present. They include the Raft Algorithm (In Search of an Understandable Consensus Algorithm), an Understandable diagnosis of the distributed system design.

The last

Here introduced different system design will face some difficulties, basic each point inside, are our predecessors on the way to solve various problems continuously explore, finally it is concluded that the industry solutions, presented in front of you, as a technical personnel, to learn these technical points is only a matter of time, However, the ability and spirit to find, face and solve problems are the most worthy of learning, and also the necessary ability for a system designer or architect.

The original link to this article is the original content of Ali Cloud, shall not be reproduced without permission.