1, the characteristics of large websites
- High concurrency, large flow: huge PV volume. Page views; Every time a user visits each page in the website, the user will be recorded once. The number of visits to the same page is accumulated.
- High availability: 7*24 hours uninterrupted service.
- Massive data: Massive data needs to be stored and managed, and a large number of servers need to be used.
- Users are widely distributed and the network situation is complex: to provide services for global users, users are widely distributed.
- Poor security environment: There are many hacker attacks.
- Rapid change of demand, frequent release: quickly adapt to the market, to meet the needs of users.
- Incremental growth: Build a large website slowly.
2, large site architecture evolution process
- Initial site architecture
A Server is just what is needed – applications, databases, files and all resources are concentrated on one Server. Typical example: PHP website based on LAMP architecture.
- Application and data services are separated
Three servers – For business development, a single Server is no longer suitable for business development. Three servers (application Server, file Server and database Server) are separated from applications and data. The application Server needs a faster and more powerful CPU, the database Server needs a faster hard disk and more memory, and the file Server needs a larger hard disk.
- Use caching to improve site performance
3+N Server mode – reduce database access pressure, improve the speed of data access site. Caches can be divided into local cache and remote cache (which can be distributed). Local cache has fast access speed but limited data volume. Remote distributed caches can be clustered and therefore have unlimited capacity.
- Use application server clusters to improve web site concurrency
Cluster – a common means to solve the problems of high concurrency and mass data, and to achieve the scalability of the system. Through the load balancing scheduler, user access can be distributed to a Server in the cluster, so that the load pressure of the application Server will no longer become the bottleneck of the whole website.
- Database read/write separation
With the increase of the number of users, database has become the biggest bottleneck, the common means to improve database performance is to carry out read and write separation and table separation, read and write separation as the name implies that the database is divided into read library and write library, through the master and standby function to achieve data synchronization. Split table is divided into horizontal segmentation and vertical segmentation, horizontal switching is to split a large database table, such as user table. Vertical sharding is to switch according to different services. For example, tables related to user services and commodity services are placed in different databases.
- Use reverse proxy and CDN to speed web site response
CDN and reverse proxy are both based on caching. The difference is that CDN is deployed in the network provider’s machine room, while reverse proxy is deployed in the website’s central machine room. The purpose of using CDN and reverse proxy is to return data to the user as soon as possible, on the one hand, to speed up the user access speed, on the other hand, to reduce the load on the back-end server.
- Use distributed file systems and distributed database systems
As the number of users increases day by day, more and more files are generated, but a single file server cannot meet the requirements. Distributed file system support is required.
- Use NoSQL and search engines
Both NoSQL and search engines are technologies derived from the Internet and have better support for scalable distributed features. The application server accesses all kinds of data through a unified data access module, which relieves the application of managing many data sources.
- business
With the further expansion of the business, the application became very bloated. At this time, we needed to split the business of the application. For example, Baidu was divided into news, web pages, pictures and other businesses. Each business application is responsible for relatively independent business operations. Businesses communicate with each other through messages or share a database.
- Distributed service
At this time, we find that each business application will use some basic business services, such as user service, order service, payment service, security service, these services are the basic elements supporting each business application. We extract these services and use the partial service framework to build distributed services.