At the beginning of the website architecture, the initial business volume is not large, the traffic is small, the architecture at this time, applications, databases, files are deployed on a server, some even just rent the host space
1. Application, data, and file separation Deploy applications, databases, and files on independent servers and configure different hardware based on server functions to achieve optimal performance.
2. Use caching to improve website performance Most website visits follow the principle of 28, that is, 80% of the access requests end up on 20% of the data. Therefore, we can cache hotspot data to reduce the access path of hotspot data and improve user experience. The common methods of cache implementation are local cache and distributed cache. Of course, there are CDN, reverse proxy. 2.1 Local Cache The local cache, as the name implies, caches data locally on the application server. It can be stored in memory, files, and components. The local cache is characterized by high speed, but the amount of cached data is limited because of the limited local space.
2.2 Distributed Cache Distributed cache is characterized by the ability to cache massive data, and it is easy to expand. It is often used in portal websites, and the speed is not as fast as local cache. The commonly used distributed cache are Memcached and Redis.
2.3 Reverse Proxy The reverse proxy server is deployed in the equipment room of a website. When a user receives a request, the reverse proxy server returns cached data to the user. If no cached data is available, the reverse proxy server continues to access the application server to obtain data, reducing the data acquisition cost.
2.4 CDN assume that our servers are deployed in hangzhou room, for zhejiang user access is faster, and for Beijing user access is slow, this is due to zhejiang and Beijing respectively belong to different developed areas of telecom and China unicom, Beijing needs to through Internet users to access the router through a long path to access to the server, in hangzhou The return path is the same, so the data transfer time is longer. In this case, CDN is often used to cache the data content to the equipment room of the carrier. When users access the data, they first obtain the data from the nearest carrier. In this way, the path of network access is greatly reduced.
3. Use clustering + load Balancing to improve application server performance Application servers, as the entry point of a website, bear a large number of requests, and we often use application server clusters to share the number of requests. A load balancing server is deployed in front of the application server to schedule user requests and distribute the requests to multiple application server nodes based on the distribution policy.
Commonly used load balancing technology hardware F5, the price is relatively expensive generally more than 15W. Software LVS, Nginx, HAProxy. LVS is a four-layer (transport layer) load balancer, which selects internal servers according to the destination address and port. Nginx and HAProxy are seven-layer (application layer) load balancers, which can select internal servers according to the packet content. Therefore, LVS distribution path is better than Nginx and HAProxy, with higher performance. However, Nginx and HAProxy are more configurable, such as dynamic and static separation (according to the characteristics of the request packet, choose the static resource server or application server).
With the increase of the number of users, the database has become the biggest bottleneck. The common means to improve the performance of the database is to carry out read and write separation and separate database and separate tables. Read and write separation, as the name implies, is to divide the database into read and write libraries and achieve data synchronization through the master and standby functions. The database and table are divided into horizontal and vertical sharding. Horizontal sharding is to split large tables of a database, such as user tables. Vertical sharding is based on different services. For example, tables related to user services and commodity services are placed in different databases.
4.2 Use NoSql database and search engine for mass data query and analysis, we use NoSql database plus search engine can achieve better performance. Not all data needs to be in relational data. Common NOSQL includes mongodb, hbase, and Redis. Search engines include Lucene, Solr, and ElasticSearch.
5. Split the services of the application server. With the expansion of services, applications become bloated, so we need to split the services of the application server. Each business application is responsible for relatively independent business operations. Businesses communicate with each other through messages or share databases.
6.1 Distributed File System As more and more users and more and more files are generated, a single file server cannot meet the requirements, so the support of a distributed file system is required. Common distributed file systems include GFS, HDFS, and TFS.
Google File System (GFS) provides high overall performance for a large number of users. • It is suitable for deployment on inexpensive common hardware. • Hadoop Distributed File System (HDFS) provides high throughput data access. • Running on general Hardware • Highly fault tolerant • Suitable for deployment on cheap machines TFS (Taobao Flies System) is aimed at massive amounts of unstructured data, Provides highly reliable and concurrent storage access • High scalability, high availability, and high performance • Internet-oriented services • Suitable for massive small file storage 6.2 Distributed services Basic service services such as user service, order service, payment service, and security service are used by various service applications. These services are the fundamental elements that underpin business applications. We extract these services and use the partial service framework to build distributed services.
Summary The complete system architecture diagram is as follows:
Note: the architecture of large websites is constantly improved according to business needs, and specific design and consideration will be made according to different business characteristics. This article only describes some optimization techniques and means involved in a conventional large website. I hope the above content can help you. Many PHPer will encounter some problems and bottlenecks when they are advanced, and they have no sense of direction when writing too many business codes. I have sorted out some information, including but not limited to: Distributed architecture, high scalability, high performance, high concurrency, server performance tuning, TP6, Laravel, YII2, Redis, Swoole, Swoft, Kafka, Mysql optimization, shell scripting, Docker, microservices, Nginx, etc. Many knowledge points can be shared free of charge, need to click
10 years of architects share advanced PHP architecture information to help everyone achieve 30K
zhuanlan.zhihu.com](zhuanlan.zhihu.com/p/340304217)
I hope the above content can help you. Many PHPer will encounter some problems and bottlenecks when they are advanced, and they have no sense of direction when writing too many business codes. I have sorted out some information, including but not limited to: Distributed architecture, high scalability, high performance, high concurrency, server performance tuning, TP6, Laravel, YII2, Redis, Swoole, Swoft, Kafka, Mysql optimization, shell scripting, Docker, microservices, Nginx, etc.