preface

Every programmer has a dream of being an architect, but the dream is full and the reality is very dull – most programmers do simple CRUD in their work, and I am no exception. Programmers would blush if they used the word “architecture” so often. But should we give up our dream of being an architect just because we can’t be one yet? Obviously not, master the relevant theory of architecture design is the premise of becoming an architect, with a methodology can better guide us to work. Opportunities are always for those who are prepared. What if one day the dream comes true?

In order to learn “architecture”, I secretly looked at two secret books of martial arts “from 0 to learn architecture” “large website technology architecture”. There are some secrets: What does it mean that the two books have a lot in common about the core of architecture? It shows that architects have routines. Master the routine and practice, the future is not far away.

The past and present lives of architecture

1. What is architecture

Before we understand what architecture is, let’s take a look at some of the related terms: architecture, framework, component, module, and system. A comment on Learning Architecture from Zero summarizes their definitions and differences very succinctly, as follows:

Architecture is the top-level design; Frameworks are half-baked for programming or configuration; Components are reuse from the technical dimension; Module is the division of responsibility from the business dimension; A system is an entity that works in concert with each other.

Software architecture is the top-level design of a software system. It defines which individuals a software system includes: subsystems, modules and components. At the same time, the rules of individual operation and cooperation between individuals are clarified.

2. Historical background of the architecture

From machine language to assembly language to high-level language, from structured programming method to object-oriented programming to interface programming, from single machine single process to single machine multi-process to distributed cluster. Every language, programming idea, and technology upgrade is designed to meet the needs of larger and larger software. As the software system grows to a certain extent, data structure and algorithm are no longer the main problems in software design. As a system becomes more and more composed, the organization or “software architecture” of the entire system creates a new set of problems. Complex systems with irrational architectures often face the following problems:

  • The system has huge scale, serious internal coupling and low development efficiency.
  • System coupling is serious, affecting the whole body, subsequent modification and expansion is difficult;
  • The system logic is complex and prone to problems, which are difficult to troubleshoot and repair.

Software architecture arises at the historic moment, but due to the complexity and variability of software systems, no one architecture can meet the design requirements of all systems. Like object-oriented programming and software engineering, it is not a silver bullet in software design.

3. Purpose of architecture design

Only when the purpose of doing anything is clear, can we grasp the direction, so as to formulate plans and implement them. Architecture design is no exception. The main purpose of architecture design is to solve the problems caused by software system complexity. From the small management system to taobao and wechat, architecture design is needed in the design and development process. Due to the different complexity of software systems, the difficulty of architectural design is also different, but the basic process is similar, master the process of architectural design, even simple internal tool platform can also play flowers.

4. Summary of the role of architecture

Architectural design can be large or small but not optional. Architecture is not out of reach and don’t sneeze at it.

1. Architecture is a solution to the complexity of software systems.

2. Architecture is (important) decisions.

3. Demand-driven architecture, bridging analysis and design implementation.

4. The relationship between architecture and development costs.

Architectural design theory

1. Sources of architectural design complexity

The term complex software is often used in software architecture definitions. What is complex software? Large-scale website is a kind of complex software. The complexity sources that need to be considered in large-scale website architecture (and other software systems are similar) are mainly shown in the figure below:

Including high performance, high availability, easy to expand, scalable, security and low-cost six sources. Note: In order to further illustrate the software architecture design, this paper uses large-scale websites as complex software

In addition to developing business logic, most programmers have learned most of their life skills to solve these six problems. We’ve read a lot of books, we’ve written a lot of code, we’ve done a lot of design, and we’ve finally found the root of it all here today. This tree of large site architecture elements can contain most of the knowledge you have seen and is the root of the knowledge in your structured mind. As long as your knowledge is rich enough, layer by layer, the tree can become a forest. This is also the purpose of my writing this article, not just a simple copy and summary. The message to most programmers is: Start structuring the knowledge in your head with this tree.

Three Principles of Architectural Design (From the Web)

1, suitable principle: suitable is better than the industry leading.

Architecture has no advantages and disadvantages, but it is appropriate. “Your honey, my arsenic”; The architecture must match the business phase of the enterprise; Don’t design for your resume. Good architecture doesn’t mean fit. Cutting feet to fit shoes and swelling and filling fat are not in line with the appropriate principle; The so-called appropriate, must match the stage of the business, can reasonably integrate resources together and give full play to the maximum effect, and can quickly fall to the ground.

Rule of simplicity: Simple is better than complex

“I didn’t have time to write a short letter, so I had to write a long letter.” In fact, simple is harder than complex. In the face of system architecture, business logic, and complexity, we can write complex systems, but in software, complexity stands for “problem.” In architecture design, if both simple and complex solutions can meet requirements, it is best to choose simple solutions. But, in fact, when a software system becomes too complex, someone reconstructs it, upgrades it, and makes it simple again, which is the general trend in software development. The principle of simplicity is a simple and great one. Google’s MapReduce system uses the idea of divide and conquer, and behind it is a classic example of turning complex problems into simple ones.

3, evolution principle: evolution is better than one step in place

Large human societies, natural organisms and small cells seem to follow this universal principle, and software architecture is no exception. Businesses are growing, technologies are innovating, and the external environment is changing, all of which is a warning to architects not to be too ambitious or blindly copy what big companies do. We should carefully analyze the characteristics of the current service, identify the main problems faced by the service, design a reasonable architecture, quickly implement it to meet the needs of the service, and then constantly improve the architecture during operation and evolve the architecture along with the service.

The three principles of architectural design are important. Before designing your software system, ask yourself whether this is the right architecture. Simple? Is it evolvable?

Architecture Design process

1. Identification complexity

When you do software system development, you can’t just start by typing code. You often need to do some analysis. In addition to functional analysis, software system complexity analysis is also required, which determines your subsequent solution selection.

There are six sources of architecture complexity, but the main complexity comes from high performance, high availability, and ease of extension. We talk more about these three points when discussing architectural design, and this article is no exception.

But the architecture does not always have to meet these three requirements. Depending on the software system, most of the time you only need to satisfy one or two of these, and rarely three.

When analyzing the sources of high performance, high availability and easy to expand complexity, it is better to have indicators to support, such as how many QPS high performance should be achieved, how many high availability should be achieved 9, how much scalability should be extended. Do not rely on your own imagination, this will either be too simple complexity analysis, after the launch can not meet the requirements; Or complexity analysis is too complex, resulting in excessive design and project delays.

2. Design alternatives

There are many mature technical solutions that address the three sources of complexity: high performance, high availability, and easy scaling. For example, high-performance caching, load balancing, multiplexing, high availability of active/standby solutions, clustering solutions, remote live, scalable design patterns, architecture layering, plug-in, etc. Once you understand the complexity of the software system (option N), you can follow the roadmap to find alternative solutions.

When designing a solution, the architect needs to pay attention to the following points to avoid falling into the cliches:

  • Always think about designing the best scheme.
  • Just do one plan.
  • The scheme design is too detailed.

This is why many big companies like to ask the interview of some open source frameworks or software source code, their purpose is not to let you familiar with the source code, but in the software system architecture design can quickly select the existing various frameworks to find the relatively appropriate.

Evaluate and select alternatives

The program has, but also need to evaluate and select the most appropriate program. The three principles of architectural design are appropriate, simple, and evolutionary, but there are also the following challenges:

  • Every solution must be feasible
  • Every solution is imperfect
  • The evaluation criteria are subjective

Architects tend to have one of the following approaches:

  • Minimalist: Choose only the simplest
  • Most niu pai: only choose the most popular
  • Most familiar pie: Choose only the most familiar

However, the alternative plan evaluation above is not reliable, what we really need to do is to carry out a 360 degree eia for each plan: ** list the quality attribute points we need to pay attention to, and then evaluate each plan from the dimensions of these quality attributes, and then select the optimal plan suitable for the situation at that time. ** Common solution quality attributes include performance, availability, hardware cost, project investment, complexity, security, scalability, etc. Choose the most appropriate solution based on your team’s emphasis on each of these attributes.

4. Detailed scheme design

Start with a detailed design of the alternatives, such as how the individuals will operate and how they will collaborate with each other.

At this point, let the front-line developer fill in the details and the architect do the final review. On the one hand, it reduces the workload of architects, and on the other hand, it cultivates new people.

conclusion

From the above content, we have learned the development history of software system design and the definition, purpose, mission, principle and process of architectural design, which is a complete closed loop, laying a solid theoretical foundation for the following practical architectural design.

Architecture design practice (taking large websites as an example)

A high performance

High performance is a source of architectural complexity, and any software system has requirements for high performance, but the requirements are different. The more performance required software system architecture design is more complex, and vice versa is easier. However, if high cannot be quantified by indicators, then reasonable architectural design and further optimization cannot be carried out. So the two parts of high performance are performance metrics and performance optimization.

Performance indicators

The main indicators for evaluating the performance of large websites are as follows:

  • Response time: Indicates the total time between sending a request and receiving a response, which directly reflects the speed of the system.
  • Concurrency: The number of requests processed per unit of time, usually expressed in TPS, is a visual representation of the system capacity.
  • Throughput: Indicates the number of simultaneous requests processed by the system. This usually includes TPS, QPS, or HPS. Of course, it is not enough to have the definition of indicators. It is also necessary to have basic quantification of these indicators, such as response time less than 300ms, concurrency not less than 1W links, throughput of 1W QPS, etc.

Performance optimization

Large website from the foreground to the background to the data can be roughly divided into three levels, this paper respectively from three levels to introduce the performance optimization involved in the technical points, but the system performance optimization program should never be large and complete, here is just a general talk. If you have not been involved in the knowledge points involved here, suggest their own Baidu, after understanding hanging on the brain map tree; If you have a deeper understanding of the points involved, expand your brain map.

1. Web front-end optimization

Reduce HTTP requests: One HTTP request contains many invalid data (such as response codes and headers) other than valid data, so try to merge multiple HTTP requests into one, such as js files and CSS files, and small images into one large image.

Use the browser cache: Many sites have static files such as CSS, JS and images that are updated infrequently, and these static files require frequent requests so they can be cached in the browser. The browser Cache can be set for days or months by setting the cache-Control and Expirs properties in the HTTP header. (Functions of each field in the HTTP header)

Enable compression: Compress data, especially js/ CSS/HTML text, before the server returns data. The compression rate is as high as 80%. However, the compression and decompression of data will cause some pressure on both the server side and the browser side, which needs to be weighed in the design. (What are the data compression algorithms?)

Reduce Cookie transmission: do not put a large amount of user data in cookies, on the one hand, it is not safe, on the other hand, it will increase the pressure of network transmission. The most common method is to store user data in the Session, and only the Session ID needs to be stored in the Cookie. (What is Cookie/Session and the difference between them)

CDN acceleration: The essence of a Content Distribute Network (CDN) is still a cache, which is generally provided by Network operators and stores data (generally static data JS, CSS, pictures, files, etc.) in the nearest place to users. (What is CDN?)

Reverse proxy: The reverse proxy works between the browser and the back-end server, masking the details of the back-end server. On the one hand, it can be used for load balancing of back-end servers, and on the other hand, it can cache static content to speed up website access. (What are forward and reverse proxies?)

2. Application server optimization

Use caching: Caching improves performance everywhere, from the front end to the back end. The front-end has been described, and the back-end service cache is divided into local cache and distributed cache (local cache and distributed cache principles and their advantages and disadvantages).

Local caching can be implemented using simple Hash tables or open source software such as Encache under SpringBoot. (Hash definitions, how to avoid conflicts).

Distributed caching is a common caching solution, especially in clustered environments. Distributed caching has many mature open source software programs, such as Memcached and Redis, and the principles of distributed caching and open source software are common in interviews due to their widespread use. The principles of distributed caching, the pros and cons of Memcached and Redis, and some common terms in caching: cache penetration, cache hot spots, cache avalanche, etc.

Asynchronous operations: Asynchrony is another way to improve performance, including both local and distributed asynchrony. Local asyncrony can be implemented with simple multithreading or threading, or with mature event frameworks such as Google’s Guava Eventbus. Local processes can use shared memory to achieve asynchrony, but need to implement a concurrent safe local message queue in the shared memory.

Distributed asynchronism can use message queues, which can decouple the system as well as cut peaks and fill valleys. It is the ultimate solution for asynchronous distributed systems, common message queues are RabbitMQ, MetaQ, Kafka. (Principles of message queuing and advantages and disadvantages of RabttiMQ)

Use clusters: No matter how good the performance of one machine, the throughput of the entire system is limited. To support greater throughput, in addition to optimizing code and improving performance on a single server, systems can be deployed from one machine to multiple machines. So called distributed systems do this, because it’s the only way to break the Moore’s Law spell.

Code optimization: Improving performance from the code level is a necessary skill for programmers. There are too many knowledge points involved. Here are just a few points, which can be used as a primer:

(1) Use multi-threading, multi-process.

(2) Make full use of memory.

(3) Resource reuse: singleton and pooling technology.

(4) Better algorithms and data structures.

(5) JAVA virtual machine parameters tuning and avoiding FULLGC.

(6) Appropriate IO model and thread/process model.

3. Storage performance optimization

Disk I/O optimization: Use SSDS instead of mechanical disks and HDFS instead of common file systems. Database optimization: Database optimization is a cliche and a frequent interview topic. There are several optimization methods (high performance Mysql) :

(1) Add index

(2) Optimize SQL statements

(3) Sub-database sub-table

(4) Read and write separation

(5) Distributed relational database

(6) Use Nosql

High availability

It doesn’t matter how high your performance is if your system isn’t working. Like high performance, high needs to be quantified by something like four nines. The high availability of large websites is divided into application high availability, service high availability and data high availability.

Highly available services

1. Interface services are highly available

Service degradation: Services or interface functions are degraded to provide only some functions or completely stop all functions. Common degradation methods are: system backdoor degradation and independent system degradation.

Traffic limiting: Degradation is a post-failure treatment, and traffic limiting is a pre-failure prevention. Traffic limiting means that only the traffic that can be tolerated is allowed in, and the traffic that exceeds the system’s access capacity is discarded. Common traffic limiting methods can be classified into two types: request-based traffic limiting and resource-based traffic limiting.

Traffic queuing: a variant of traffic limiting. Traffic limiting denies users directly, and queuing makes users wait for a period of time. Service circuit breakers: It’s very confusing to confuse circuit breakers with downgrades, which deal with failures in the service itself. The purpose of the circuit breaker is to deal with the situation of external system failure: to actively disconnect dependencies on other external unavailable services, preventing further avalanche effects.

Hierarchical management: Services of different levels are separated based on the importance of service functions to prevent non-core services from affecting core services.

2. High availability of business services

Cluster 1. Multi-machine deployment

2. Service registration and discovery

Live in different places:

1. Same city, different district + special line: low complexity, high usefulness, unable to deal with major disasters (such as urban earthquake, power failure)

2. Cross-city remote: high complexity, high network delay and data consistency. Not all businesses are suitable.

3, transnational distance: shortcomings with cross-city distance, mainly used to:

  • Different services for different regions, such as Amazon China/Amazon USA
  • Do read-only business more work, less sensitive to latency, like Google Search.

Micro service

Service description: Service description describes the following information about the service: (1) Service name (2) Information to be provided for invoking the service (3) Data format returned after invoking the service Common service descriptions include RESTFUL apis, XML configuration, and IDL files.

Registry: Publish and subscribe services through a registry. Microservices are divided into service callers and service consumers, and the registry is their communication bridge.

Service framework: provide the communication agreement between service consumers and service providers: (1) what protocol is used for service communication, (2) what method is used for data transmission, (3) what format is used for data compression

Service monitoring: Service monitoring is an important means to know whether a service is available. It mainly includes three processes: indicator collection, data processing and data presentation.

Service tracing: In addition to monitoring service invocations, you also need to record each layer of links that service invocations pass through for problem tracing and fault location. Ali eagle eye

Service governance: Service monitoring can find problems, service tracing can locate problems, and service governance is required to solve problems. Load balancing and automatic failover.

Highly available data

1. Data consistency

CAP theory: For a distributed computing system, it is impossible to simultaneously meet the three design constraints of Consistence, Availability and Partition+Tolerance. CAP focuses on reading and writing data rather than the entire system. CAP can satisfy at most two of the three, P must be satisfied, CP must be for consistency (such as ZooKeeper), and AP must be for usability (such as Eureka).

ACID: ACID is a theory developed by database systems to guarantee the correctness of transactions. ACID contains four constraints: Atomicity, Consistency, Isolation and Durability.

BASE: BASE+ is Basically+Available, +Soft+State, and +Eventual+Consistency. The core idea is that even if it cannot be strong, CAP+ Consistency is strong. However, applications can adopt appropriate methods to achieve final consistency.

2. Data backup

The essence of high availability storage solutions is to replicate data to multiple storage devices and implement high availability through data redundancy. Its complexity is mainly reflected in how to deal with data inconsistency caused by replication delay and interruption. Therefore, we need to consider and analyze any high availability storage solution from the following aspects:

  • How data is replicated
  • What are the responsibilities of each node
  • How do I deal with replication delays
  • How to deal with replication terminal

Active/standby replication:

Master/slave replication:

(1) As the client’s read complexity increases, it needs to perceive the master-slave relationship and select a machine to perform read operations.

(2) If the delay of data replication is relatively large, there will be an intermediate state of inconsistent data.

(3) Manual intervention is required in case of failure.

Master master replication:

(1) Both hosts can read and write data. If one of the two hosts fails, overall services are not affected.

(2) The client does not need to distinguish roles and can send read and write requests to any machine.

The seemingly perfect host-master architecture also has numerous drawbacks:

(1) Two-way replication cannot guarantee real-time performance. If the delay is high, there will be intermediate state of inconsistent data.

(2) A lot of data cannot be copied in both directions, such as user self-added ID, inventory data, balance data, etc.

The master-master replication architecture has strict requirements on data design and is generally suitable for temporary, lost-able, and overwritable data scenarios.

Data cluster: The data cluster introduced above has only one host, no matter it is the master/slave/master/master. They have the following two disadvantages:

(1) The data stored by the host is limited.

(2) Manually restore the host after it is down.

A data cluster is a system consisting of multiple machines that are used to store data. Data cluster is divided into centralized data cluster and decentralized data cluster:

(1) Data cluster: the data on the machines in the cluster is the same, which is written by one machine and then copied to other machines in the cluster. For details, see the principles and architecture of ZooKeeper.

(2) Scattered cluster of data: the data is scattered on different machines, and of course it will be backed up on several machines in the cluster. Refer to the consistent hash algorithm and Amazon’s Dynamodb principle and architecture.

Data partitioning: Divide data into regions to prevent very large disasters, such as floods, earthquakes, and tsunamis, in a certain region. Data in each region can be backed up in the preceding active-active/active-secondary/active-standby/cluster mode to ensure high availability. Data partitioning architecture complexity needs to be considered in terms of:

(1) Data volume

(2) Zoning rules

(3) Replication rules: centralized, mutual and independent

High availability quality assurance

1. Website release: In the flexible release process, each service closed is a small part of the cluster and can be accessed immediately after the release is completed;

2. Automated test: use automatic test tools or scripts to complete the test;

3. Pre-release verification: The pre-release server is introduced, which is almost the same as the official server, except that it is not configured on the load balancing server and external users cannot access it;

4. Code control: at present, most websites adopt SVN, branch development and trunk publishing mode; In addition, Git is widely adopted by the open source community as a version control tool, and is gradually replacing SVN.

Monitoring and Alarm

1. Monitoring data collection

(1) User behavior log collection: server log collection and client log collection; At present, many websites are gradually developing log statistics and analysis tools based on real-time computing framework Storm.

(2) Server performance monitoring: collect server performance indicators, such as system Load, memory occupancy, disk IO, etc., to judge timely and prevent problems before they occur;

(3) Operation data report: collection and report, and unified display after summary. The application program needs to deal with the logic of operation data collection in the code;

2. Monitoring and management

(1) System alarm: alarm threshold and the contact information of the personnel on duty are configured. When the system alarm occurs, the engineer can be notified in time even if he is thousands of miles away;

(2) Failure transfer: when the monitoring system finds a fault, it actively notifies the application of failure transfer;

(3) Automatic elegant downgrade: in order to cope with the peak visit of the website, the initiative to close some functions, release some system resources, to ensure the normal operation of core application services;

Easy extension

The only constant in this world is change, and unlike architectural architecture, software system architecture is a constantly evolving process. If the software system is not well architected from the beginning, it will not be acceptable to have to refactor with every major change.

core idea

There are many ways to design extensible architectures, but the basic idea behind all of them can be summed up in one word: “tear down”.

Break up thinking

Breaking up software systems in different ways results in different architectures. Common split ideas are as follows:

  • Process-oriented split: Split the entire process into phases, with each phase as a part.
  • Service Oriented split: Split the services provided by the system, with each service as a part.
  • Function-oriented split: Split the functions provided by the system, with each function as a part.

More specific examples:

The flow corresponds to the +TCP/IP+ four-layer model, because the +TCP/IP+ network communication flow is: application layer → transport layer → network layer → physical + data link layer, no matter what the uppermost application layer is, this flow will not change.

Services correspond to the +HTTP, FTP, and SMTP+ services at the application layer. HTTP+ provides +Web+ services, FTP+ provides file services, and SMTP+ provides mail services.

Function Each service provides a corresponding function. For example, HTTP provides the GET and POST functions, FTP provides the upload and download functions, and SMTP provides the mail sending and receiving functions.

Process oriented split is more like vertical split, service oriented split is more like horizontal split, and function oriented split is a finer split than service oriented split. They are not one or the other, but complement each other. The architecture of a system can use either process-oriented decomposition or service-oriented decomposition, or even function-oriented decomposition.

Existing architecture

Process-oriented split: layered architecture Service-oriented split: SOA, microservices-oriented functional split: microkernel architecture

1. Layered architecture and SOA

Layered architecture: It is common to divide a system into application layer, service layer, and storage layer. Complex software systems have more layers. A layered architecture needs to ensure that the differences between the layers are clear enough and that the boundaries are clear enough for people to understand the architecture when they look at the architecture diagram. Traditional hierarchical architecture:

Service Oriented Architecture (SOA)

2. Micro services

Microservices In short, the microservice architectural style [1] is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery.

Keywords of micro services: Small, Lightweight and automated. Infrastructure required for microservices:

3. Microkernel architecture

conclusion

All of the above, almost all knowledge points related to the development of the architecture, are just dragonfly water, point to stop. Everyone knows the depth of knowledge is different, but to be a qualified architect, you need to have as much breadth and depth as possible. The summary above is a comprehensive and systematic description of the knowledge points that a programmer should know.