Technical architecture for large Web sites - Introduction

Lists the concepts involved in large site architecture, along with a brief explanation

preface

This article is a review of the book “Large Website Architecture Design” (li Zhihui), similar to the text version of the “mind map”
The full text mainly revolves around “performance, availability, scalability, scalability, security” these five elements
Performance, availability, and scalability are all related to application servers, cache servers, and storage servers

An overview of the

Three latitudes: evolution, model and element
Five elements: performance, availability, scalability, extensibility, and security

evolution

For example, we can refer to the evolution of large-scale website architecture:

Initial site architecture: a server with applications, databases, files, and other resources. Like the LAMP architecture
Application and data service separation: three servers (with different hardware resources), namely application server, file server, and database server
Use caching to improve web site performance: there are two types, local caches cached on application servers and remote caches cached on dedicated distributed caching servers
Improve web site concurrency with application server clusters: load balancing scheduling servers to distribute access requests to any machine in the application server cluster
Read/write separation of databases: Databases use primary/secondary hot backup. When the application server writes data to the primary database, the primary database synchronizes data updates to the secondary database through the primary/secondary replication mechanism. Application servers use specialized data access modules to be transparent to applications
Use reverse proxies and CDN to speed web site response: Both are based on caching. The reverse proxy is deployed in the central machine room of the website, and the CDN is deployed in the network provider’s machine room
Use distributed file systems and distributed database systems: The last resort for database splitting is more commonly a business branch
Use NoSQL and search engines: Better support for scalable distribution
Service separation: Services on the entire website are divided into different applications. Each application is deployed and maintained independently. Applications are connected through hyperlinks or message queues to distribute data or access the same data storage system
Distributed services: Common services are extracted and deployed independently

Evolutionary values

The core value of large site architecture is flexibility to adapt to the needs of the site
The main force driving the development of large website technology is the business development of website

myth

Follow the solutions of big companies
Technology for technology’s sake
Trying to solve every problem with technology

Architectural patterns

The key to patterns is their repeatability

Layering: Horizontal segmentation
Segmentation: Vertical segmentation
distributed: The main purpose of layering and partitioning is to facilitate the distributed deployment of partitioned modules. Common schemes:
- Distributed applications and services
- Distributed static resource
- Distributed data and storage
- Distributed computing
- Distributed configuration, distributed locks, distributed files, etc
Cluster: Multiple servers deploy the same application to form a cluster and provide external services through load balancing devices
The cache: Put the data to the nearest position to speed up the processing speed, improve the performance of the first segment, can speed up the access speed, reduce the back-end load pressure. Use the cacheTwo prerequisites: 1. Data access hotspots are unbalanced. 2. Data is valid for a certain period of time and will not expire soon
- CDN
- The reverse proxy
- The local cache
- Distributed cache
asynchronous: Aims at system decoupling. Asynchronous architectures are typical of the consumer-producer pattern, with the following characteristics:
- Improve system availability
- Speed up website access
- Eliminate concurrent access peaks
Redundancy: High availability. Cold and hot backups of databases
Automation: includes automation of release process, automation of code management, automation of testing, automation of security detection, automation of deployment, automation of monitoring, automation of alarm, automation of failover, automation of failure recovery, automation of degradation, automation of resource allocation
Security: password, mobile verification code, encryption, verification code, filtering, risk control

The core elements of

Architecture is “the highest level of planning, the rules that are hard to change”. Focus on five elements:

performance
Availability (the Availability)
Scalability (Scalability)
Expansibility (Extensibility)
security

architecture

The following five elements are summarized in turn

A high performance

The main performance test indicators are:

Response time: The time required for an application to perform an operation
Concurrency: The number of requests that the system can process simultaneously
Throughput: Indicates the number of requests processed by the system per unit time
Performance counters: Data metrics that describe the performance of a server or operating system

Performance test method:

The performance test
The load test
Pressure test
Stability test

Performance optimization, based on the hierarchical architecture of the site, can be divided into three categories:

Web front-end performance optimization
- Browser Access optimization
  - Reducing HTTP requests
  - Using browser caching
  - Enable compression
  - CSS is placed at the top of the page and JavaScript is placed at the bottom
  - Reduce Cookie transmission
- CDN acceleration: Essentially a cache, generally cache static resources
- The reverse proxy
  - Protect website security
  - Speed up Web requests by configuring caching
  - Load Balancing
Application server performance optimization: The main methods include caching, clustering, and asynchrony
- Distributed cache (The first law of Web performance Optimization: Consider using caching to optimize performance)
- Asynchronous operation (Message queuing, peak clipping)
- Use cluster
- Code optimization
  - Multithreading (designed to be stateless, using local objects, and accessing resources concurrently using locks)
  - Resource reuse (singleton, object pool)
  - The data structure
  - The garbage collection
Storage server performance optimization
- Mechanical hard drives vs. solid-state drives
- B+ tree vs. LSM tree
- RAID vs. HDFS

High availability

Highly available website architecture: the purpose is to ensure that the server hardware failure when the service is still available, data is still saved and can be accessed, the main means of data and service redundancy and failover
Highly available applications: The salient feature is the statelessness of the application
- Failover of stateless services through load balancing
- Session management of application server clusters
  - Session replication
  - The Session binding
  - Use cookies to record sessions
  - The Session server
Highly available services: Stateless services that can use failover policies like load balancing, in addition to the following policies
- Hierarchical management
- timeout
- The asynchronous call
- Service degradation
- Idempotent design
Highly available data: The primary means are data backup and failover mechanisms
- Principle of CAP
  - Data consistency
  - Availibility
  - Partition Tolerance
- The data backup
  - Cold standby: The disadvantage is that data consistency and data availability are not guaranteed
  - Hot backup: includes asynchronous hot backup and synchronous hot backup
- Failure transfer: it consists of the following three parts
  - Failure to confirm
  - Access to the transfer
  - Data recovery
Software quality assurance for highly available web sites
- Web site
- Automated testing
- Pre-release verification
- Code control
  - Trunk development, branch release
  - Branch development, trunk release
- Automated publishing
- Gray released
Website operation monitoring
- Monitoring data collection
  - User behavior log collection (server and client)
  - Server Performance Monitoring
  - Operational data report
- Monitoring management
  - An alarm system
  - Failure to transfer
  - Automatic graceful degradation

scalability

The term “large” for a large site means:

User level: lots of users and lots of access
Function aspect: function is complex, product is numerous
Technical: the site needs to deploy a large number of servers

Scalability is broken down into the following aspects

Scalable design of web architecture
- Physical separation of different functions to achieve scaling
  - Longitudinal separation (separation after stratification)
  - Horizontal separation (separation after business separation)
- A single function scales by cluster size
Scalability design for application server clusters
- HTTP redirection load balancing
- DNS Load balancing for domain name resolution
- Reverse proxy load balancing (application layer load balancing at the HTTP protocol level)
- IP load balancing (data distribution is completed in the kernel process)
- Data link layer load balancing (Data link layer change MAC address, triangulation transfer mode, LVS)
- Load balancing algorithm
  - Round Robin (RR)
  - Weighted Round Robin (WRR)
  - Stochastic (Random)
  - Least Connections
  - Source Hashing
Scalability design for distributed cache clusters
- Memcached Access model for distributed cache clusters
  - Memcached client (including API, routing algorithm, server list, communication module)
  - Memcached server cluster
- Scalability challenges for Memcached distributed cache clusters
- Consistent Hash algorithm for Distributed Cache (Consistent Hash ring, virtual layer)
Scalability design of data storage service cluster
- Scalability design for relational database clusters
- Scalability design of NoSQL database

extensible

“On/Off Principle” at system Architecture Design Level

Build an extensible website architecture
Reduce coupling by utilizing distributed message queues
- Event Driven Architecture
- Distributed message queue
Build reusable business platform with distributed services
- Web Services and enterprise-level distributed services
- Features of distributed services for large web sites
- Distributed Service Framework Design (Thrift, Dubbo)
Extensible data structures (such as ColumnFamily design)
Use open platform to build website ecosystem

The security architecture of the site

XSS attack and SQL injection attack are the two main means of website application attack, in addition to CSRF,Session hijacking and other means.

Attack and Defense
- XSS attacks: Cross Site Script attacks
  - reflective
  - A persistent
- XSS defense means
  - Sanitize (that is, escape some HTML dangerous characters)
  - HttpOnly
- Injection attacks
  - SQL injection attack
  - OS injection attack
- Injection of defense
  - Avoid guessing database table structure information
  - disinfection
  - Parameter binding
- CSRF attack: Cross Site Request Forgery
- CSRF defense: The primary means is to identify the requester
  - The form Token
  - Verification code
  - Referer Check
- Other attacks and vulnerabilities
  - Error Code
  - HTML comments
  - File upload
  - Directory traversal
- Web Application Firewall (ModSecurity)
- Scanning for website security vulnerabilities
Information encryption technology and key security management
- Unidirectional hash encryption: Information of different input lengths is hashed to obtain a fixed-length output
  - Irreversible, non-plaintext
  - Salt can be added to increase safety
  - A small change in the input can result in a completely different output
- Symmetric encryption: Encryption and decryption use the same key
- Asymmetric encryption
  - Information transmission: public key encryption, private key decryption
  - Digital signature: private key encryption, public key decryption
- Key security management: Information security transmission is guaranteed by keys, and the improvement means include:
  - Keep keys and algorithms on a separate server
  - The encryption and decryption algorithm is put in the application system, and the key is put in the independent server
Information filtering and anti-spam
- Text matching
- Classification algorithm
- The blacklist

Author @ brianway more article: personal website | CSDN | oschina

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Technical architecture for large Web sites – Introduction

preface

An overview of the

evolution

Architectural patterns

The core elements of

architecture

A high performance

High availability

scalability

extensible

The security architecture of the site

Technical architecture for large Web sites – Introduction

preface

An overview of the

evolution

Architectural patterns

The core elements of

architecture

A high performance

High availability

scalability

extensible

The security architecture of the site

Related Posts

Look at the animation algorithm: sort – merge sort

Simply use Vue to bridge Django + GraphQL projects

EMQ X + IoTDB: Stores MQTT messages to the sequential database