Like it and see. Make it a habit.

This article has been collected on github-java3c, which contains my series of articles, interview questions, self-study materials, architect videos, e-books, etc.

Two weeks after the explosion, more than ten backend developers joined hands to write a systematic article, which is convenient for everyone to understand the knowledge point of distributed system step by step, and the way of pictures so that everyone can be clear at a glance to figure out the composition of various architectures.

First, the characteristics of distributed system

1, high concurrency, large flow

High concurrent users and heavy traffic are required. Google has an average daily PV of 3.5 billion and a daily IP access of 300 million. Tencent’s wechat has a maximum of 963 million online users.

2. High availability

The system must provide 7 x 24 hours uninterrupted service. Online availability reaches at least 4 nines,

4 nines = (1-99.99%) X24 X 365 = 0.876 hours = 52.56 minutesCopy the code

3. Massive data

Need to store and manage massive data, need to use a large number of servers. Facebook uploads nearly a billion photos a week, Baidu catalogs tens of billions of web pages, and Google has nearly one million servers serving users around the world.

4. Users are widely distributed and the network situation is complex

Many of the largest Internet sites serve a global audience, with a wide distribution of users and a wide variety of networks. In China, there are also problems with network connectivity of various operators.

5. Poor security environment

Due to the openness of the Internet, Internet sites are more vulnerable to attack, and large websites are attacked by hackers almost every day.

6. Rapid change of requirements and frequent release

Different from the release frequency of traditional software versions, Internet products have a high release frequency in order to quickly adapt to the market and meet user needs. Generally large websites have new versions of their products released online every week, while small and medium-sized websites are released more frequently, sometimes dozens of times a day.

7. Progressive development

Almost all large Internet sites started from a small site, gradually developed. Facebook was created by Zuckerberg in his harvard dorm room; Google’s first server was deployed in a lab at Stanford University; Alibaba was born in Jack Ma’s living room. Good Internet products are slowly operated out, not developed at the beginning, which is just the development and evolution process of the website architecture.

Second, the evolution and development of large website architecture

The technical challenges of large web sites are mainly due to the large number of users, high concurrent access and large amount of data. Any simple business that needs to deal with tens of millions of data and hundreds of millions of users becomes very difficult. Large web architecture addresses this type of problem.

1, the initial stage of the website architecture

Large websites are developed from small websites, and so is the website architecture, which gradually evolved from small website architecture. Small websites at the beginning do not have too many people to visit, only need a server is more than enough, then the website architecture is shown as follows:

Applications, databases, files, and all resources are on one server.

2. Application services and data services are separated

With the development of website services, a server gradually cannot meet the needs: more and more users access the performance is worse and worse, more and more data leads to insufficient storage space. This is where you need to separate the application from the data.

After separating applications and data, the entire site uses three servers: application server, file server, and database server. The three servers have different requirements for hardware resources:

Application servers need to handle a lot of business logic, so they need faster and more powerful cpus. Database servers need fast disk retrieval and data caching, so they need faster disks and more memory; File servers need to store a large number of files uploaded by users, so they need a larger hard disk.

At this point, the architecture of the website system is shown in the figure below:

After the separation of applications and data, servers with different features assume different service roles, and the concurrent processing capability and data storage space of the website are greatly improved, supporting the further development of the website business. However, as the number of users gradually increased, the site again faced a challenge: too much database pressure caused access delays, which in turn affected the performance of the entire site, the user experience suffered. This needs to further optimize the website architecture.

3. Use caching to improve site performance

Web visits are characterized by the same 80/20 rule as wealth distribution in the real world: 80% of business visits focus on 20% of data. Since most business access is concentrated in a small part of the data, so if this small part of the data cache in memory, you can reduce the database access pressure, improve the speed of data access throughout the website, improve the database write performance.

There are two types of caches used by web sites: local caches cached on application servers and remote caches cached on dedicated distributed cache servers.

Local caches are faster to access, but they are limited by the application server’s memory limitations, and can compete with applications for memory. Remote distributed cache can be deployed in a cluster. Servers with large memory can be deployed as dedicated cache servers. In theory, the cache service is not limited by memory capacity.

After the use of caching, data access pressure is effectively relieved, but a single application server can handle limited connection requests, during the peak times of website access, the server becomes the bottleneck of the entire website.

4. Use application server clusters to improve the concurrent processing capacity of websites

Cluster is a common method to solve the problem of high concurrency and mass data.

When a server’s processing capacity, storage space is insufficient, do not attempt to replace a more powerful server, for large websites, no matter how powerful the server, can not meet the continuous growth of the business needs of the website. In this case, it is more appropriate to add a server to share the access and storage burden of the original server.

As far as website architecture is concerned, as long as we can improve the load by adding one server, we can continue to increase the number of servers and improve system performance in the same way, so as to achieve system scalability.

Application server implementation cluster is a relatively simple and mature design of website scalable architecture, as shown in the figure below:

Through the load balancing scheduling server, the access requests from users’ browsers can be distributed to any server in the application server cluster. If there are more users, more application servers can be added to the cluster, so that the pressure of application servers will no longer become the bottleneck of the whole website.

5, database read and write separation

Site after using a cache, making access to most of the data read operation can be done through a database can not but there is still a part of the read operation (cache access do not hit, cache expiration), and all the write operation need to access the database, in web site after reaching a certain size, the database because of the high pressure load and become the bottleneck of the website.

Most mainstream databases provide the master/slave hot backup function. By configuring the master/slave relationship between two databases, data updates from one database server can be synchronized to the other server.

The website uses this function of database, realizes the database read and write separation, thus improves the database load pressure. As shown below:

When writing data, the application server accesses the master database. The master database synchronizes data updates to the slave database through the master/slave replication mechanism, so that the application server can obtain data from the slave database when reading data. In order to facilitate application programs to access the database after read/write separation, a special data access module is usually used on the application server to make the database read/write separation transparent to applications.

6. Use reverse proxy and CDN to speed up website response

With the continuous development of the website business, the scale of users is increasing. Due to the complex network environment in China, the speed of accessing the website varies greatly among users in different regions. Studies have shown that website access delay is positively correlated with user turnover rate. The slower a website access is, the more likely users are to lose patience and leave. In order to provide a better user experience and retain users, websites need to speed up website access.

The main means are the use of CDN and direction proxy. As shown below:

CDN and reverse proxy are based on caching.

CDN deployed in the network provider’s room, when the user request web service, can from his network provider room to get the data of the current reverse proxy is deployed on the web site at the center of the room, when the user requests arriving at center machine room, the first access server is a reverse proxy server, if the cache in the reverse proxy server, users of a resource required to service this request It will be returned directly to the user

The purpose of using CDN and reverse proxy is to return data to the user as soon as possible, on the one hand, to speed up the user access speed, on the other hand, to reduce the load on the back-end server.

7. Use distributed file systems and distributed database systems

No single powerful server can meet the growing business needs of large web sites. After read and write separation, the database is divided from one server into two servers. However, with the development of website business, it still cannot meet the demand, so it needs to use distributed database. The same goes for file systems, which require a distributed file system.

As shown below:

Distributed database is the last resort for website database splitting and is only used when single table data size is very large. A more common method of database splitting for web sites is to deploy data from different businesses on different physical servers.

8. Use NoSQL and search engines

As website business becomes more and more complex, the demand for data storage and retrieval is also more and more complex, so websites need to adopt some non-relational database technologies such as NoSQL and non-database query technologies such as search engines.

As shown below:

Both NoSQL and search engines are technologies derived from the Internet and have better support for scalable distributed features. The application server accesses all kinds of data through a unified data access module, which relieves the application of managing many data sources.

9. Business split

Large sites respond to increasingly complex business scenarios by dividing their entire web business into different product lines using a divide-and-conquer approach. For example, large shopping and transaction websites will split their home page, shops, orders, buyers and sellers into different product lines, which are under the responsibility of different business teams.

Technically, a website will be divided into many different applications according to product lines, and each application will be deployed independently. Applications can establish relationships through a hyperlink (the navigation links on the home page each point to a different application address), data can be distributed through message queues, and of course, the most important is to access the same data storage system to form an associated complete system.

As shown below:

Distributed microservices

As service separation becomes smaller and smaller and storage systems become larger, the overall complexity of application systems increases exponentially, making deployment and maintenance more difficult. Because all applications need to connect to all database systems, the number of connections is the square of the server size in a website with tens of thousands of servers, resulting in insufficient database connection resources and denial of service.

Since each application system needs to perform many same service operations, such as user management and product management, these shared services can be extracted and deployed independently. These reusable services connect to the database to provide shared services, while application systems only need to manage user interfaces and complete specific business operations by calling shared services through distributed services.

As shown below:

Split VS cluster

1, break up

Different service modules are deployed on different servers. The modules communicate and call each other through RPC to split service functions and deploy independently. Multiple servers form a whole to provide services externally.

2, the cluster

The same service module is deployed on multiple servers to implement unified scheduling using distributed scheduling software to reduce access pressure on a single server.

4. Comparison of architectures

1. Monomer architecture

The monomer architecture is easy to develop, test and deploy, and the invocation Process of each function, module and method in the system is in-process invocation without inter-process Communication (IPC).

  • Vertical Angle: Layered Architecture

  • Horizontal perspective: Break software into modules based on technical, functional, and responsibility dimensions for code reuse and management.

The real defect of monomer system is not how to split, but the lack of isolation and autonomy after splitting.

Problems such as memory leaks, thread explosions, blocking, and dead loops affect the entire program, not just one function or module. Consuming higher level common resources, such as port numbers, database connection pool leaks, will affect the entire machine, and even other individual copies in the cluster.

From the point of view of maintainability, the single architecture is not advantageous, because the code is in the same process, it is difficult to block error propagation, cannot stop, update, upgrade, gray distribution, A/B testing is also more complex.

IN fact, there is nothing to say about a single architecture, ALL IN ONE, for small sites more than enough, not to add a few more machines.

2, SOA

Three representative architectural patterns

  • Information Silo Architecture: An Information chimney is also known as an Information Island. A system using this Architecture is called an isolated Island Information system or a chimney-buying Information system. A design pattern that does not interoperate or coordinate with other related information systems at all.

  • Microkernel Architecture: Microkernel Architecture is also known as plug-in Architecture. Public services, resources and data are centralized into a Kernel (also known as Core System) that is commonly relied on by all business systems. Specific business systems exist in the form of plug-in Modules, providing extensible, naturally isolated and flexible functions and features.

    The limitation and premise of the microkernel architecture is that it assumes that the plug-in modules in the system are unknown to each other, which modules will be installed in the system, and these plug-ins have access to common resources in the kernel, but do not interact directly.

  • Event-driven Architecture: Build a set of Event Queues between subsystems, where messages outside the system are sent as events and consumed by subscriptions. Consumers of events are highly decoupled but can communicate through the event pipeline.

> The essence of the SOAP protocol's gradual marginalization: excessive complexity due to overly strict specification definitions. It can realize complex integration and interaction between multiple heterogeneous large-scale systems, but it is difficult to promote as a widely applicable software architecture style.Copy the code

3. Micro services

Microservices is an architectural style in which a large complex software application consists of one or more microservices. Each microservice in the system can be deployed independently, and each microservice is loosely coupled. Each microservice is only focused on completing one task and doing it well. In all cases, each task represents a small business capability.

Microservices, by their very nature, are SOA architectures. However, the connotation is different. Microservices are not bound to any special technology. In a microservice system, there can be services written in Java or Python, which are unified into a system based on Restful architecture style. Therefore, the micro service itself has nothing to do with the specific technology implementation and has strong scalability.

Microservices pursue a more liberal architectural style, discarding almost all of the constraints and regulations that can be discarded in SOA and advocating “standards of practice” instead of “standards of specification”.

Service registration discovery, trace governance, load balancing, fault isolation, authentication and authorization, scaling, transport communications, transaction processing, and so on, there is no longer a unified solution in microservices.

4. Post-microservice era

Post-micro-service era (Cloud Native)

From the software to deal with the problem of micro-service architecture alone, to the era of soft and hard integration and joint efforts to deal with the problem of architecture, this is the “post-micro-service era”.

Traditional Spring Cloud vs. Kubernetes solutions

Kubernetes Spring Cloud
Elastic scaling Autoscaling N/A
Service discovery KubeDNS / CoreDNS Spring Cloud Eureka
Configuration center ConfigMap / Secret Spring Cloud Config
The service gateway Ingress Controller Spring Cloud Zuul/Gateway
Load balancing Load Balancer Spring Cloud Ribbon
Service security RBAC API Spring Cloud Security
Trace monitoring Metrics API / Dashboard Spring Cloud Turbine
The drop fuse N/A Spring Cloud Hystrix

As virtualized infrastructure expands from a single service container to service clusters, communication networks, and storage facilities made up of multiple containers, the distinction between hardware and software becomes blurred. Once virtualized hardware can keep up with the flexibility of software, non-business technical issues can be stripped away from the software level and quietly solved within the hardware infrastructure, allowing software to focus solely on the business and truly “build” teams and products around business capabilities.

Virtualization infrastructure can only be managed by containers, and the granularity is relatively rough. In order to solve such problems, such as circuit breakers, Service monitoring, authentication, authorization, security, and load balancing, the Sidecar Proxy mode of Service Mesh emerges. This agent in addition to implement the communication between normal service (referred to as the data plane communications), also receive instruction from the controller (referred to as a control plane communication), according to the configuration of the control plane, to analyze the content of the data plane communication process, in order to realize the fusing, certification, measurement, monitoring, load balancing and other additional features. This allows for no additional processing code at the application level and provides a level of fine management that is almost as good as program code.

5. The age of no service

Serverless architecture

If microservice architecture is the ultimate of distributed system, then non-service architecture may be the starting point of “non-distributed” cloud system.

No service involves only two things: Backend and functions.

  • The back-end infrastructureDatabase, message queue, log, cache, etc. are used to support the operation of business logic, but have no business meaning of the technical components, these back-end facilities are running on the cloud, no service called it"Back-end as a Service"Backend as a Service (BaaS).
  • functionRefers to the business logic code, which runs in the cloud, regardless of computing power and capacity issues, no service is called"Function as a service"(Function as a Service, FaaS)

The vision of no service is for developers to focus purely on the business without having to worry about technical components, which are readily available on the back end without the hassle of procurement, copyright and selection. There is no need to think about how to deploy, the deployment process is completely hosted in the cloud, the work is done automatically by the cloud; You don’t have to worry about computing power, you have a whole data center supporting you, and you can think of it as infinite; There is no need to worry about operation and maintenance. It is the responsibility of cloud computing service providers to maintain the continuous and smooth operation of the system, rather than the responsibility of developers.

Fifth, REST design style

At the heart of the ideological difference between REST and RPC is the goal of abstraction, that is, the difference between resource-oriented and process-oriented programming.

REST is not a remote service invocation protocol. The protocols are prescriptive and mandatory, with at least one specification document, but REST does not define these. REST is a style rather than a specification or protocol.

REST, short for Representational State Transfer, is actually a further abstraction of Hypertext Transfer (HTT), much like the relationship between an interface and an implementation class.

1. Understand REST

Read the examples below to understand what “representations” are and other key concepts in REST.

  • Resource: You are reading an article called REST Design Style, and the content itself (the information, the data) is called a Resource.
  • Representation: The way information is represented when it interacts with the user, which is exactly the same as what we call the Presentation Layer in our layered software architecture. For example, the server returns HTML, PDF, Markdown, RSS and other versions to the browser, which are multiple representations of the same resource.
  • state(State) : Contextual information that can only be produced in a particular context is called “State”. What we’re talking about isState (Stateful)Or is itStateless, are only relative to the server, which itself remembers the user state calledA stateful; The client remembers the status and explicitly informs the server when requesting it, calledstateless.
  • Transfer: Whether the state is provided by the server or the client, the “take down the next article” behavior logic must be provided by the server, because only the server owns the resource and its representation. The server somehow transforms “the article the user is currently reading” into “the next article”, which is called “representation state transition”.
  • Uniform Interface: THE HTTP protocol has a Uniform Interface in advance. It includes: There are seven basic operations: GET, HEAD, POST, PUT, DELETE, TRACE, and OPTIONS. Any server that supports THE HTTP protocol will comply with these rules. If these operations are performed on a specific URI, the server will trigger the corresponding representation state transfer.
  • Hypertext Driven: the browser as the general client of all websites, any website navigation (state transfer) behavior is not preset in the browser code, but by the server to respond to the request information (Hypertext) to drive.
  • Self-descriptive Messages: Because resource representations can take many different forms, there should be explicit information within the Messages that tells clients what type of message it is and how to handle it. A widely used self-description method is to identify the Internet media Type (MIME Type) in an HTTP Header named “content-type”, such as “Content-type: application/json; Charset = UTF-8 “, it indicates that the resource is returned in JSON format. Use the UTF-8 character set for processing.

RESTful system

An ideal restful system should meet the following six principles.

  1. Client-server separation (Client-server)

    Separating the logic of the user interface from the logic of the data store helps improve the portability of the user interface across platforms.

  2. Stateless

    Stateless is a core principle of REST, don’t go to REST hope server is responsible for maintaining state, every time send request from the client, should include all the necessary context information, session information is kept by the client is responsible for the maintenance, transfer the status of the server based on the client side to implement business process logic, drive the whole application state changes.

  3. Cacheability

    Stateless service improves the visibility, reliability and scalability of the system, but reduces the network of the system. The popular definition of “reduced networking” is that a function that requires only one (or a few) requests with a stateful design may require multiple requests or have extra redundant information in the requests. To mitigate this contradiction, REST wants software systems to be like the World Wide Web, allowing clients and intermediaries (such as proxies) to cache part of the server’s response.

  4. Layered System

    Refers to the fact that the client generally does not need to know whether to connect directly to the final server or to an intermediate server on the path. Intermediate servers can improve system scalability through load balancing and shared cache mechanisms, which also facilitate the deployment of caching, scaling, and security policies. A typical application of this principle is Content Distribution Network (CDN).

  5. Uniform Interface

    REST wants developers to program for resources, and it wants software system design to focus on abstracting what resources the system should have, rather than on abstracting what behaviors (services) the system should have. Resource-oriented programming is usually more abstract. High levels of abstraction mean that the downside is that they tend to be further away from the human way of thinking, and the upside is that they tend to be more generic.

  6. Code On Demand

    Any technique for sending executable software programs from a server to a client at the request of a client (e.g., a browser). Code on demand gives the client the freedom to know in advance how all the information from the server should be processed and run.

RESTful benefits

  • Reduced learning costs of service interfaces.

    Uniform Interfaces are an important hallmark of REST. They map standard operations on resources to standard HTTP methods that have the same usage and semantics for each resource and do not need to be learned.

  • Resources naturally have collection and hierarchy structure.

    Resource-centric abstract interfaces, since resources are nouns, naturally generate collections and hierarchies.

  • REST is bound to the HTTP protocol.

    REST reuses concepts already defined in the HTTP protocol and related underlying support to solve problems.

4, RESTful deficiencies and disputes

  • Resource-oriented programming is only suitable for CRUD. Process-oriented, object-oriented programming can handle really complex business logic

    Users can use custom methods, which, in the style of Google’s recommended REST API, should be placed at the end of the resource path, embedded with colons and custom verb suffixes. For example, I could map the DELETE operation to the standard DELETE method. If an API was also provided to recover the DELETE, it might be designed to:

    POST /user/user_id/cart/book_id:undeleteCopy to clipboardErrorCopied
    Copy the code

    If you don’t want to use custom methods, it’s perfectly possible to design a recycle bin resource that holds items that can still be restored, and map a restore delete to a PUT or PATCH method as if it were a modification to a state value of that resource.

  • REST is fully bound to HTTP and is not suitable for high-performance transport scenarios

    REST is not really appropriate for scenarios that require direct control over transport, such as binary details, encoding form, message format, connection mode, and so on, which often exist between internal nodes of a service cluster

  • REST is not conducive to transaction support

  • REST does not support transport reliability

    The simplest and most crude way to deal with transmission reliability is to send the message again. This simple treatment works only if the service is Idempotency, where the effect of a service being repeated multiple times is equal to that of a single execution.

  • REST lacks the ability to process resources “in parts” and “in batches.

CAP theory and Base theorem

1. Database transactions

  • Consistency: Ensures that all data in the system is consistent with expectations and there is no contradiction between related data.
  • Atomic: A transaction guarantees that changes to multiple data within the same business process will either succeed or be undone at the same time.
  • Isolation: In different business processes, transactions ensure that data being read and written by services are independent of each other and will not affect each other.
  • Durability: Transactions should ensure that all data modifications that are successfully submitted are persisted correctly without losing data.

All applications that require data consistency, including but not limited to: databases, transactional memory, caches, message queues, distributed storage, etc., use transactions.

  • Internal consistency: When A service uses only one data source, consistency through A, I, and D is classic and relatively easy. At this point, the data read and write of multiple concurrent transactions can be perceived by the data source whether there is a conflict, and the final order of read and write of concurrent transactions on the time line is determined by the data source. Such inter-transaction consistency is called “internal consistency”.
  • External consistency: When a service uses multiple different data sources, or when multiple different services involve multiple different data sources at the same time, the problem becomes much more difficult. At this point, the order of concurrent or even successively executed transactions on the timeline is not determined by any data source. Such consistency between transactions involving multiple data sources is called “external consistency”.

Implement atomicity and persistence

Atomicity guarantees that all or none of the operations of a transaction take effect, and there is no intermediate state. Persistence ensures that once a transaction is in effect, its modifications cannot be undone or lost for any reason.

Data must be successfully written to disk, magnetic tape, such as persistent storage to have persistence, only the data stored in the memory, once encountered application collapse suddenly, or database, operating system side of the collapse, and even machine suddenly loses power outage, and so on and so forth are lost, the unexpected called “collapse” (Crash).

Because of the intermediate state of the write, the following situations can occur.

  • Uncommitted transaction, crash after write: Didn’t modify the three data application, but the database will have one or two changes in the data to disk, collapse at this time, once after the restart, the database must have a way to know that before the crash happened in an incomplete shopping operation, will have been modified to recover data from disk into without modification, to ensure atomicity.
  • Committed transactions, written before the crash: the program has been modified finished three data, but has not been all three database data changes are written to disk, collapse at this time, once after the restart, the database must have a way to know that before the crash happened a full shopping operation, will hardly had written to disk that part of the data to write, to ensure the durability.

Since the intermediate state of writing and Crash are unavoidable, in order to ensure atomicity and persistence, Recovery measures can only be taken after the Crash. Such data Recovery operation is called “Crash Recovery”. Data is also referred to as Failure Recovery or Transaction Recovery.

In order to successfully complete the crash recovery, can’t write data in the disk like program to modify memory variable values, direct change a certain table row a column of a value, but will have to modify the data needed for the operation of all information, including changing what data, the data is physically located on which page of memory and disk blocks, from what value to any value, and so on. In the form of a log — that is, only sequential appending file writes (which are the most efficient writes) are recorded to disk first.

Only after all the log records are safely dropped and the database sees a “Commit Record” in the log that represents a successful transaction, will the real data be modified based on the information in the log. After the modification is complete, Adding an “End Record” to the log to indicate that the transaction has been persisted is called “Commit Logging”.

It is not hard to understand how Commit Logging ensures data persistence and atomicity: First of all, once the log is successfully written to the Commit Record, the entire transaction is successful, even if the actual data modification crash, restart according to the log information has been written to the disk, continue to modify the data, which ensures persistence. Second, if the log crashes without successfully writing a Commit Record, the entire transaction will fail. After the system restarts, some logs without a Commit Record will be seen, which will be marked as rollback, and the entire transaction will look like it never happened, which ensures atomicity.

Commit Logging suffers from a huge congenital defect: all real changes to the data must occur after the transaction commits, that is, after the log is written to the Commit Record. Before that, if the disk I/O have enough spare time, if a transaction to modify the amount of data is very large, takes up a lot of memory buffer, no matter what reason, are not allowed to modify the data on the disk before the transaction Commit, this is a Commit Logging was set up, the premise of is very unfavorable to improve the performance of the database.

ARIES proposes a Logging improvement called “write-ahead Logging”, which allows changing data to be written in advance before a transaction commits.

Write-ahead Logging divides the time at which variable data is written into FORCE and STEAL cases, bounded by the transaction commit point.

  • FORCE: After a transaction is submitted, it is called FORCE if changing data must be written at the same time. If changing data cannot be written at the same time, it is called no-force. In reality, most databases use the no-force policy. As long as there are logs, changing data can be persisted at any time. In order to optimize disk I/O performance, there is NO need to FORCE data to be written immediately.
  • STEAL: Before the transaction commits, allowing the variable data to be written in advance is called STEAL, and not allowing it is called no-steal. In order to optimize disk I/O performance, allowing data to be written in advance helps to utilize idle I/O resources and save memory in the database cache.

Commit Logging allows no-force, but not STEAL. If a portion of the changed data is written to disk before the transaction commits, it will become an error if the transaction is rolled back or a crash occurs.

Write-ahead Logging allows both no-force and STEAL. The solution is to add another Log type called Undo Log, which must be logged before changing data is written to disk. Indicate where data was modified, from what value to what value, and so on. In order to erase pre-written data changes based on Undo Log during transaction rollback or crash recovery. Undo logs are now commonly translated as “rollback logs”, and the Redo logs that were used to repeat data changes during crash recovery were named Redo logs, which are generally translated as “Redo logs”. With the addition of Undo Log, write-Ahead Logging performs the following three phases during crash recovery.

  • Analysis stage: This phase scans logs from the last Checkpoint to find all transactions that do not have an End Record and form a set of transactions to be recovered. This collection will consist of at least two parts: Transaction Table and Dirty Page Table.
  • Redo phase: This phase repeats History based on the set of transactions to be recovered generated during the analysis phase as follows: Find all logs that contain Commit Records, write the modified data to disk, add an End Record to the log, and remove the set of transactions to be recovered.
  • Rollback stage (Undo) : this stage deals with the set of remaining recovery transactions after the analysis and redo stage, and the remaining transactions need to be rolled back, which are called Loser. According to the information in Undo Log, the information that has been written to the disk in advance is rewritten to achieve the purpose of rolling back these Loser transactions.

Operations in both redo and rollback phases should be designed to be idempotent.

Database can produce a total of four combinations according to whether FORCE and STEAL are allowed. From the point of view of optimizing disk I/O, no-force plus STEAL combination is undoubtedly the highest performance. From the perspective of algorithm implementation and logging, the complexity of no-force plus STEAL combination is undoubtedly the highest. The relationship between these four combinations and Undo Log and Redo Log is shown in the figure.

Achieve isolation

Isolation ensures that each transaction reads and writes data independently of each other.

Modern databases provide the following three types of locks.

  • Write Lock (also known as x-lock) : If a Write Lock is added to data, only the transactions that hold the Write Lock can Write data. When data holds the Write Lock, other transactions cannot Write data and cannot impose a read Lock.

  • Read Lock (also called Shared Lock, s-lock for short) : Multiple transactions can add multiple Read locks to the same data. After the data is added to a Read Lock, it cannot be added to a write Lock. Therefore, other transactions cannot write to the data, but can still Read the data. For a transaction holding a read lock, if the data has a read lock only on its own transaction, it is allowed to be upgraded directly to a write lock and then written to the data.

  • Range Lock: An exclusive Lock is applied to a Range within which data cannot be written.

    SELECT * FROM books WHERE price < 100 FOR UPDATE; Copy to clipboardErrorCopiedCopy the code

    With a range lock, not only can you not modify existing data in that range, but you can’t add or delete any data in that range, which a set of exclusive locks cannot do.

Transaction isolation levels in descending order:

  • serializable(the Serializable) :ANSI/ISO SQL-92The highest isolation level defined inserializable(the Serializable).Concurrency control theoryThe Concurrency Control (Concurrency Control) determines that the degree of isolation is at odds with the Concurrency capability. The higher the degree of isolation, the lower the throughput for concurrent accesses.
  • Repeatable degreesRepeatable Read: Read and write locks are added to the data involved in the transaction and are held until the end of the transaction, but no scope locks are added.Repeatable readthanserializableThe weakening isPhantom reads the question(Phantom Reads), which is when two identical scope queries yield different result sets during transaction execution. The reason is thatRepeatable readThere is no range lock to prevent the insertion of new data within that range.
  • Reading has been submitted(Read Committed) : Write locks are Committed until the end of a transaction, but Read locks are released immediately after the query is complete.Reading has been submittedthanRepeatable readThe weakening isUnrepeatable read problemNon-repeatable Reads (NON-repeatable Reads), which means that during the transaction execution, two queries of the same row of data get different results. The reason is thatReading has been submittedThe isolation level of lacks read locks throughout the transaction cycle to prevent read data from changing.
  • Read uncommitted(Read Uncommitted) : A write lock is committed until the end of a transaction, but no Read lock is committed at all.Read uncommittedthanReading has been submittedThe weakening isDirty reads the questionDirty Reads, which are uncommitted data that one transaction Reads during the execution of another transaction.

2. CAP theory

In distributed systems, C stands for Consistency, A for Availability, and P for Partition tolerance.

The proof of CAP is based on asynchronous network, which is also a model that reflects the situation in real network. In a real network system, it is impossible for nodes to keep in sync, even with clocks, and all nodes rely on the messages they receive for local computation and communication. This concept is actually quite strong, meaning that any timeouts are impossible because there is no common time standard. We will then extend CAP’s proof to weaker asynchronous networks where clocks are not exactly consistent but run at the same pace, allowing nodes to make timeout judgments.

The proof of CAP is very simple. Suppose two node sets {G1, G2}, all communication between G1 and G2 is disconnected due to network fragmentation. If P is not satisfied, the whole network is unavailable. Due to A’s request, G2 must return this read request, and C must be unsatisfying due to the presence of P.

CAP theory says that in a distributed storage system, only two of the above can be implemented. Since the current network hardware is bound to have problems such as delayed packet loss, partition tolerance is something we must implement. So there is a trade-off between consistency and availability, and no distributed system can guarantee all three.

CA – A single point cluster, a system that satisfies consistency, availability, and is generally not very powerful in scalability. CP – Meet the consistency, partition tolerance required system, usually not particularly high performance. AP – Systems that meet availability and partition tolerance may generally have lower requirements for consistency.

You have to make trade-offs when building distributed architectures. Strike a balance between consistency and availability. More than most Web applications, strong consistency is not really required. So sacrifice C for P, which is the current direction of distributed database products.

So where do you choose between consistency and availability?

For web sites, many of the main features of relational databases are often useless. Many Web real-time systems do not require strict database transactions, have low requirements for read consistency, and some occasions do not have high requirements for write consistency. Allow for final consistency.

3. BASE theorem

BASE is a solution to reduce availability caused by strong consistency of relational database.

BASE is an abbreviation of three terms:

  • Basically Available
  • Soft State
  • Eventually consistent

The idea is that the overall scalability and performance of the system can be improved by relaxing the system’s demands for consistency at one point in time. Why do you say so? The reason is that large systems often due to the geographical distribution and extremely high performance requirements, it is impossible to use distributed transactions to complete these indicators, in order to obtain these indicators, we must adopt another way to complete, here BASE is the solution to this problem.

4. Distributed consistency theory, PAxos, Raft, ZAB algorithm

This piece of research is not too deep, not to show the shame, provide a website to demonstrate Raft algorithm.

Demonstrate the Raft thesecretlivesofdata.com/raft/

Write at the end

More than 10 kinds of architecture, most of the sites are in use, there are still some native like a cloud, no service architecture is also rapid development, but no matter how architecture development, we still advice web site architecture should be based on the business scene to determine, in line with their own is the best, can not blindly pursue tall, after all, the pursuit of the goddess is to spend much more money!