Lists the concepts involved in large site architecture, along with a brief explanation

preface

  • This article is a review of the book “Large Website Architecture Design” (li Zhihui), similar to the text version of the “mind map”
  • The full text mainly revolves around “performance, availability, scalability, scalability, security” these five elements
  • Performance, availability, and scalability are all related to application servers, cache servers, and storage servers

An overview of the

  • Three latitudes: evolution, model and element
  • Five elements: performance, availability, scalability, extensibility, and security

evolution

For example, we can refer to the evolution of large-scale website architecture:

  1. Initial site architecture: a server with applications, databases, files, and other resources. Like the LAMP architecture
  2. Application and data service separation: three servers (with different hardware resources), namely application server, file server, and database server
  3. Use caching to improve web site performance: there are two types, local caches cached on application servers and remote caches cached on dedicated distributed caching servers
  4. Improve web site concurrency with application server clusters: load balancing scheduling servers to distribute access requests to any machine in the application server cluster
  5. Read/write separation of databases: Databases use primary/secondary hot backup. When the application server writes data to the primary database, the primary database synchronizes data updates to the secondary database through the primary/secondary replication mechanism. Application servers use specialized data access modules to be transparent to applications
  6. Use reverse proxies and CDN to speed web site response: Both are based on caching. The reverse proxy is deployed in the central machine room of the website, and the CDN is deployed in the network provider’s machine room
  7. Use distributed file systems and distributed database systems: The last resort for database splitting is more commonly a business branch
  8. Use NoSQL and search engines: Better support for scalable distribution
  9. Service separation: Services on the entire website are divided into different applications. Each application is deployed and maintained independently. Applications are connected through hyperlinks or message queues to distribute data or access the same data storage system
  10. Distributed services: Common services are extracted and deployed independently

Evolutionary values

  • The core value of large site architecture is flexibility to adapt to the needs of the site
  • The main force driving the development of large website technology is the business development of website

myth

  • Follow the solutions of big companies
  • Technology for technology’s sake
  • Trying to solve every problem with technology

Architectural patterns

The key to patterns is their repeatability

  • Layering: Horizontal segmentation
  • Segmentation: Vertical segmentation
  • distributed: The main purpose of layering and partitioning is to facilitate the distributed deployment of partitioned modules. Common schemes:
    • Distributed applications and services
    • Distributed static resource
    • Distributed data and storage
    • Distributed computing
    • Distributed configuration, distributed locks, distributed files, etc
  • Cluster: Multiple servers deploy the same application to form a cluster and provide external services through load balancing devices
  • The cache: Put the data to the nearest position to speed up the processing speed, improve the performance of the first segment, can speed up the access speed, reduce the back-end load pressure. Use the cacheTwo prerequisites: 1. Data access hotspots are unbalanced. 2. Data is valid for a certain period of time and will not expire soon
    • CDN
    • The reverse proxy
    • The local cache
    • Distributed cache
  • asynchronous: Aims at system decoupling. Asynchronous architectures are typical of the consumer-producer pattern, with the following characteristics:
    • Improve system availability
    • Speed up website access
    • Eliminate concurrent access peaks
  • Redundancy: High availability. Cold and hot backups of databases
  • Automation: includes automation of release process, automation of code management, automation of testing, automation of security detection, automation of deployment, automation of monitoring, automation of alarm, automation of failover, automation of failure recovery, automation of degradation, automation of resource allocation
  • Security: password, mobile verification code, encryption, verification code, filtering, risk control

The core elements of

Architecture is “the highest level of planning, the rules that are hard to change”. Focus on five elements:

  • performance
  • Availability (the Availability)
  • Scalability (Scalability)
  • Expansibility (Extensibility)
  • security

architecture

The following five elements are summarized in turn

A high performance

The main performance test indicators are:

  • Response time: The time required for an application to perform an operation
  • Concurrency: The number of requests that the system can process simultaneously
  • Throughput: Indicates the number of requests processed by the system per unit time
  • Performance counters: Data metrics that describe the performance of a server or operating system

Performance test method:

  • The performance test
  • The load test
  • Pressure test
  • Stability test

Performance optimization, based on the hierarchical architecture of the site, can be divided into three categories:

  • Web front-end performance optimization
    • Browser Access optimization
      • Reducing HTTP requests
      • Using browser caching
      • Enable compression
      • CSS is placed at the top of the page and JavaScript is placed at the bottom
      • Reduce Cookie transmission
    • CDN acceleration: Essentially a cache, generally cache static resources
    • The reverse proxy
      • Protect website security
      • Speed up Web requests by configuring caching
      • Load Balancing
  • Application server performance optimization: The main methods include caching, clustering, and asynchrony
    • Distributed cache (The first law of Web performance Optimization: Consider using caching to optimize performance)
    • Asynchronous operation (Message queuing, peak clipping)
    • Use cluster
    • Code optimization
      • Multithreading (designed to be stateless, using local objects, and accessing resources concurrently using locks)
      • Resource reuse (singleton, object pool)
      • The data structure
      • The garbage collection
  • Storage server performance optimization
    • Mechanical hard drives vs. solid-state drives
    • B+ tree vs. LSM tree
    • RAID vs. HDFS

High availability

  • Highly available website architecture: the purpose is to ensure that the server hardware failure when the service is still available, data is still saved and can be accessed, the main means of data and service redundancy and failover
  • Highly available applications: The salient feature is the statelessness of the application
    • Failover of stateless services through load balancing
    • Session management of application server clusters
      • Session replication
      • The Session binding
      • Use cookies to record sessions
      • The Session server
  • Highly available services: Stateless services that can use failover policies like load balancing, in addition to the following policies
    • Hierarchical management
    • timeout
    • The asynchronous call
    • Service degradation
    • Idempotent design
  • Highly available data: The primary means are data backup and failover mechanisms
    • Principle of CAP
      • Data consistency
      • Availibility
      • Partition Tolerance
    • The data backup
      • Cold standby: The disadvantage is that data consistency and data availability are not guaranteed
      • Hot backup: includes asynchronous hot backup and synchronous hot backup
    • Failure transfer: it consists of the following three parts
      • Failure to confirm
      • Access to the transfer
      • Data recovery
  • Software quality assurance for highly available web sites
    • Web site
    • Automated testing
    • Pre-release verification
    • Code control
      • Trunk development, branch release
      • Branch development, trunk release
    • Automated publishing
    • Gray released
  • Website operation monitoring
    • Monitoring data collection
      • User behavior log collection (server and client)
      • Server Performance Monitoring
      • Operational data report
    • Monitoring management
      • An alarm system
      • Failure to transfer
      • Automatic graceful degradation

scalability

The term “large” for a large site means:

  • User level: lots of users and lots of access
  • Function aspect: function is complex, product is numerous
  • Technical: the site needs to deploy a large number of servers

Scalability is broken down into the following aspects

  • Scalable design of web architecture
    • Physical separation of different functions to achieve scaling
      • Longitudinal separation (separation after stratification)
      • Horizontal separation (separation after business separation)
    • A single function scales by cluster size
  • Scalability design for application server clusters
    • HTTP redirection load balancing
    • DNS Load balancing for domain name resolution
    • Reverse proxy load balancing (application layer load balancing at the HTTP protocol level)
    • IP load balancing (data distribution is completed in the kernel process)
    • Data link layer load balancing (Data link layer change MAC address, triangulation transfer mode, LVS)
    • Load balancing algorithm
      • Round Robin (RR)
      • Weighted Round Robin (WRR)
      • Stochastic (Random)
      • Least Connections
      • Source Hashing
  • Scalability design for distributed cache clusters
    • Memcached Access model for distributed cache clusters
      • Memcached client (including API, routing algorithm, server list, communication module)
      • Memcached server cluster
    • Scalability challenges for Memcached distributed cache clusters
    • Consistent Hash algorithm for Distributed Cache (Consistent Hash ring, virtual layer)
  • Scalability design of data storage service cluster
    • Scalability design for relational database clusters
    • Scalability design of NoSQL database

extensible

“On/Off Principle” at system Architecture Design Level

  • Build an extensible website architecture
  • Reduce coupling by utilizing distributed message queues
    • Event Driven Architecture
    • Distributed message queue
  • Build reusable business platform with distributed services
    • Web Services and enterprise-level distributed services
    • Features of distributed services for large web sites
    • Distributed Service Framework Design (Thrift, Dubbo)
  • Extensible data structures (such as ColumnFamily design)
  • Use open platform to build website ecosystem

The security architecture of the site

XSS attack and SQL injection attack are the two main means of website application attack, in addition to CSRF,Session hijacking and other means.

  • Attack and Defense
    • XSS attacks: Cross Site Script attacks
      • reflective
      • A persistent
    • XSS defense means
      • Sanitize (that is, escape some HTML dangerous characters)
      • HttpOnly
    • Injection attacks
      • SQL injection attack
      • OS injection attack
    • Injection of defense
      • Avoid guessing database table structure information
      • disinfection
      • Parameter binding
    • CSRF attack: Cross Site Request Forgery
    • CSRF defense: The primary means is to identify the requester
      • The form Token
      • Verification code
      • Referer Check
    • Other attacks and vulnerabilities
      • Error Code
      • HTML comments
      • File upload
      • Directory traversal
    • Web Application Firewall (ModSecurity)
    • Scanning for website security vulnerabilities
  • Information encryption technology and key security management
    • Unidirectional hash encryption: Information of different input lengths is hashed to obtain a fixed-length output
      • Irreversible, non-plaintext
      • Salt can be added to increase safety
      • A small change in the input can result in a completely different output
    • Symmetric encryption: Encryption and decryption use the same key
    • Asymmetric encryption
      • Information transmission: public key encryption, private key decryption
      • Digital signature: private key encryption, public key decryption
    • Key security management: Information security transmission is guaranteed by keys, and the improvement means include:
      • Keep keys and algorithms on a separate server
      • The encryption and decryption algorithm is put in the application system, and the key is put in the independent server
  • Information filtering and anti-spam
    • Text matching
    • Classification algorithm
    • The blacklist

Author @ brianway more article: personal website | CSDN | oschina