One, foreword
1.1 Evolution of large-scale Internet architecture
1.1.1 Taobao Technology
Core technology of Taobao (Top domestic and international, this is still the data of 2011)
- Has the largest distributed Hadoop cluster in China (ladder, about 2000 nodes, 24,000 core CPU, 48000GB memory, 40PB storage capacity)
- 80+CDN nodes are distributed nationwide, which can automatically find the nearest node to provide services and support traffic over 800Gbps
- Baidu’s search engine, which searches billions of products, is the world’s largest e-commerce platform
- Top load balancing system, top distributed system, top Internet ideas, diverse functions and extremely stable operation
- Rich ecological industry and advanced data mining technology
- … Many, many
1.1.2 Evolution of Taobao technology
From The Decade of Taobao Technology
- On April 7, 2003, Manager Ma secretly called ten employees of Alibaba to a secret raw building in Hangzhou and asked them to build a C2C website within about a month. Of course, the result is still faster to buy directly, a website based on LAMP architecture, formerly known as PHPAuction, an auction website developed in the United States. Of course you have to modify it to work.
- By the end of 2003, taobao registered 230,000 users, PV 310,000 /day, turnover of half a year 33.71 million.
- It became clear that MySQL could not support this volume of traffic, and database bottlenecks appeared. Fortunately, Ali’s DBA team was strong enough to use Oracle instead of MySQL. Oracle already had a powerful concurrent access design – connection pooling, from which it was much less expensive to set up a connection than to set up a connection alone. However, PHP did not officially support language connection pooling at that time, so Doron found an open source SQL Relay using Google (not Baidu), and the bottleneck in database software was solved temporarily.
- As a result, ali was faced with a hardware performance bottleneck. He bought EMC SAN storage devices and Oracle HIGH-PERFORMANCE RAC, and the hardware capacity was temporarily no problem.
- Because SQL Relay problems are too serious, in 2004 Taobao finally made a cross-era decision – using Java to rewrite the site.
- Taobao hired senior engineers from Sun to help build Java architecture. So how did they manage to change the programming language without changing the use of the site – modular replacement, write module A today, open A new domain, redirect the connection to that module, while the other modules remain unchanged, when all modules are completed, the original domain is abandoned. Sun’s insistence on using EJB as the control layer, coupled with iBatis as the persistence layer, resulted in a scalable and efficient Java EE application.
- After dumping Sun, Alibaba ran into data storage bottlenecks, so it bought an IBM minicomputer, and the legend of the IOE (IBM + Oracle + EMC) began.
- By the end of 2004, taobao registered 4 million users, PV 40 million /day, the total net turnover of 1 billion.
- Spring was born in 2005. The Spring framework is indispensable for Web applications, and on Taobao.com, Spring has achieved what Rod Johnson designed it to do — replace EJBs.
- By the end of 2005, taobao registered 13.9 million users, PV 89.31 million /day, the number of goods 16.63 million.
- Considering future developments, such an infrastructure is barely adequate to meet current requirements. Therefore, CDN technology came into use. At first, commercial ChinaCache was used, and then Dr. Zhang Wensong was used to build a low-energy CDN network. Taobao’s performance became better and better.
- By the end of 2006, Taobao registered 30 million users, PV 150 million /day, the number of commodities 50 million, the total online transaction volume 16.9 billion yuan.
- Taobao used NetApp’s commercial storage system until 2007, but it was still not enough to cope with the rapid growth. In the same year, Google published the design idea of GFS, and Taobao also developed its own file system based on its idea — TFS has 1GB of image storage space per user on TFS, thanks to the TFS clustered file storage system and a large number of image servers. Taobao uses real-time shrinkage graph generation, global load balancing and level 1 and level 2 caching to ensure optimal and efficient access to images.
- Taobao’s server software uses Tengine, an optimized Nginx module.
- Taobao separated the User Information Center (UIC) for all modules to call. Once again, the Dolong predecessors wrote TDBM for it, which is purely memory-based data caching (see memcached). Then, Taobao combined TBstore and TDBM to write Tair, a distributed cache data system based on key-value. Then he upgraded his iSearch system.
- By the end of 2007, Taobao registered 50 million users, PV 250 million /day, the number of goods 100 million, the total net turnover of 43.3 billion yuan.
- .
- Dubbo is the core framework of Alibaba’s internal SOA service-oriented governance scheme, which supports more than 300 million page views for more than 2000 services every day and is widely used in all member sites of Alibaba Group. Dubbo has been used by many non-Alibaba companies since it was opened in 2011.
1.2.3 Summary of technical development
1. Single-node architecture
2. Cluster architecture
3. Cluster + distributed architecture
1.2 Agent Overview
1.2.1 Forward Proxy
In general, the proxy technique defaults to forward proxy unless otherwise specified. The concept of forward proxy is as follows:
The forward proxy is A server (proxy server Z) located between the origin server (Server B) and the client (user A). In order to retrieve content from the origin server, user A sends A request to proxy server Z and specifies the destination (server B). Proxy server Z then forwards the request to server B and returns the obtained content to the client. Clients must make some special Settings to use forward proxies.
From the above concept, it can be seen that the so-called forward proxy is the proxy server to replace the visitor [user A] to access the target server [server B].
That’s what forward proxies are all about. Why use A proxy server to access server B instead of user A? This starts with the use of proxy servers.
The forward proxy server has the following functions:
-
Access the unreachable server B, as shown
Let’s look at the figure above without the complicated network routing scenario. Assume that the routers in the figure are named R1 and R2 from left to right. Assume that user A initially needs to access server B through routers R1 and R2. If router R1 or R2 fails, then user A cannot access server B. However, if user A uses proxy server Z to access server B on his behalf, since proxy server Z is not in router R1 or R2 but accesses server B through another routing node, user A can get the data of server B. A real-world example is FQ. However, since VPN technology has been widely used, FQ not only uses the traditional forward proxy technology, but also uses VPN technology.
-
Accelerate access to server B
This is not as popular as it used to be, mainly because of the rapid growth of bandwidth. In the early days of forward proxies, many people used forward proxies to speed things up. Again, assume that user A goes to server B through router R1 and router R2, and the link between router R1 and router R2 is A low bandwidth link. The links between user A and proxy server Z and from proxy server Z to server B are high-bandwidth links. So obviously you can speed up access to server B.
-
Cache role
Cache technology is closely related to proxy service technology (not only forward proxy, but reverse proxy also uses Cache technology). As shown in the figure above, if before user A accesses data D on server B, someone has accessed data D on server B through proxy server Z, then proxy server Z will save data D for A period of time. If someone happens to access data D, proxy server Z will no longer access server B. The cached data D is sent directly to user A. The term for this technique in caching is Cache hit. If more users like user A access proxy server Z, then these users can directly obtain data D from proxy server Z, instead of traveling to server B to download data.
-
Client access authorization
This aspect is still widely used nowadays. For example, some companies use ISA SERVER as a forward proxy SERVER to authorize users to access the Internet, as shown in the following figure
A firewall is used as a gateway to filter the access from the Internet. Assume that the user A and user B has set up A proxy server, the user allows access to the Internet, A and B are not allowed to access the Internet users (the restrictions on the proxy server Z) so that users A for authorization, can be accessed through A proxy server to server B, and users because B has not been authorized proxy server Z, Therefore, when accessing server B, packets are discarded.
-
Hide the whereabouts of visitors
As shown in the figure below, we can see that server B does not know that it is actually user A that accesses it, because proxy server Z directly interacts with server B on behalf of user A. If proxy server Z is fully (or not fully) controlled by user A, the term “broiler” is used.
To summarize: a forward proxy is a server located between the client and the origin server. In order to get content from the origin server, the client sends a request to the proxy and specifies the destination (the origin server). The proxy then forwards the request to the original server and returns the content to the client. The client must set up the forward proxy server, of course, if you know the IP address of the forward proxy server, and the port of the proxy program.
1.2.2 Reverse Proxy
A reverse proxy is the opposite of a forward proxy in that the proxy server is like the original server to the client and does not require any special setup for the client. The client sends a normal request to the content in the reverse proxy’s namespace, and the reverse proxy then determines where to forward the request (the original server) and returns the obtained content to the client. The functions of a reverse proxy server are as follows:
-
Protect and hide the raw resource server as shown below:
User A always thinks that it is accessing the original server B rather than the proxy server Z, but in practice the reverse proxy server accepts user A’s reply, obtains user A’s requested resources from the original resource server B, and sends them to user A. Due to the firewall, only proxy server Z is allowed access to raw resource server B. Although the firewall and reverse proxy work together to protect raw resource server B in this virtual environment, user A is unaware of this.
-
Load balancing, as shown below:
When there is more than one reverse proxy server, we can even make them into clusters. When more users access resource server B, let different proxy server Z (X) answer different users, and then send the resources needed by different users.
Of course, the reverse proxy server has the function of Cache like the forward proxy server. It can Cache the resources of the original resource server B instead of requesting data from the original resource server B every time, especially some static data, such as pictures and files. If these reverse proxy servers can be from the same network as user X, user X can access reverse proxy server X and get high quality speed. This is the core of CDN technology. The diagram below:
We are not talking about CDN, so we have removed the most critical core technology of CDN, intelligent DNS. Just to show that CDN technology actually utilizes the reverse proxy principle.
The reverse proxy conclusion is the opposite of the forward proxy in that it acts like the original server to the client and does not require any special setup for the client. The client sends a normal request to the content in the namespaces of the reverse proxy, and the reverse proxy determines where to forward the request (the original server) and returns the obtained content to the client as if it were its own.
Basically, the net does forward and reverse proxy program many, can do forward proxy software most also can do reverse proxy. The most popular open-source software is SQUID, which can be used as both a forward proxy and a front-end server for reverse proxies. In addition, MS ISA can also be used to do forward proxy on Windows platform. The main practice in reverse proxy is Web services, and the most popular in recent years is Nginx. Some people on the Internet say that Nginx cannot do forward proxy, but it is not true. Nginx can also do forward proxies, though less widely used.
1.2.3 Transparent Proxy
If the forward proxy, reverse proxy and transparent proxy according to the human blood relationship. So the forward proxy and transparent proxy are clearly Cousins, and the forward proxy and reverse proxy are Cousins.
Transparent proxy means that the client does not need to know about the presence of a proxy server, which ADAPTS your request fields and delivers real IP addresses. Note that the encrypted transparent proxy is an anonymous proxy, meaning that the proxy is not set to use. An example of transparent agent practice is the behavior management software used by many companies today. The diagram below:
Users A and B do not know that the behavior management device acts as A transparent proxy device. When user A or B submits A request to server A or server B, the transparent proxy device intercepts and modifies the packet sent by user A or B according to its own policy and sends the request to server A or B as the actual requester. When the received message is sent back, the transparent agent sends the allowed message back to user A or USER B according to its own Settings. As shown in the figure above, if the transparent agent is not allowed to access server B, user A or user B will not get the data of server B.
Release projects under Linux
The CentOS system is used
2.1 Installing the JDK on Linux
After logging in to Linux system, check whether JDK is installed first, run: Java -version, if OpenJDK is available, uninstall it, we install using SunJDK. On the relevant differences between the two:
- What are the differences between OpenJDK and JDK in Linux?
- What is the difference between OpenJDK and SunJDK?
- OpenJDK and Sun/OracleJDK differences and connections
1. Uninstall OpenJDK
-
View and Java related packages: RPM – qa | grep Java
-
Unload its
-
RPM -e –nodeps java-1.7.0-openJDK-1.7.0.45-2.4.3.3.el6.i686
-
RPM -e –nodeps java-1.6.0-openJDK-1.6.0.0-1.66.1.13.0.el6.i686
-
2, install JDK :(upload JDK, FTP software upload, upload to root directory)
-
Create a Java directory in /usr/local: mkdir Java
-
Copy the uploaded JDK to the Java directory: cp /root/jdk.xxxxx.tar /usr/local/java
-
Decompress it: tar -xvf jdk.xxx.tar
-
Yum install glibc.i686
To run using our JDK also need to install some plug-ins, just like our Java, the bottom rely on some C++, C language things.
Linux yum command
-
Configure environment variables:
Set Java environment JAVA_HOME=/usr/local/java/jdk1.7.0_72 CLASSPATH=.:$JAVA_HOME/lib.tools.jar PATH=$JAVA_HOME/bin:$PATH export JAVA_HOME CLASSPATH PATH Save the configuration and exit source /etc/profile Reload the configuration file for the changed configuration to take effect immediatelyCopy the code
Note: For details about how to upload files to the root directory, see the following. You can use FileZilla to drag files to the root directory.
2.2 Installing MySQL on Linux
-
Check according to the mysql: RPM – qa | grep mysql
-
Run the following command to uninstall mysql: RPM -e –nodeps mysql-libs-5.1.71-1.el6.i386
-
Upload the mysql package
-
Create a mysql directory at /usr/local/
-
Copy the mysql package to the mysql directory
-
Tar: tar -xvf mysql-5.6.22-1.el6.i386. RPM -bundle.tar: tar -xvf mysql-5.6.22-1.el6.i386. RPM -bundle.tar: tar -xvf mysql-5.6.22-1.el6.i386. RPM -bundle.tar
-
The installation
-
Run the following command to install mysql on the server: RPM -ivh mysql-server-5.5.49-1.linux2.6.i386.rpm
Note: you do not need a password to log in to mysql for the first time.
-
Run the following command to install the mysql client: RPM -ivh mysql-client-5.5.49-1.linux2.6.i386.rpm
-
-
Run the service mysql status command to check the mysql service status
- Start the mysql:
service mysql start
- Stop the mysql:
service mysql stop
- Start the mysql:
-
Change the password of mysql root
Mysql -uroot mysql -uroot update user set password = password('1234') where user = 'root'; flush privileges; # refreshCopy the code
-
Enabling Remote Access
grant all privileges on *.* to 'root' @'%' identified by '1234'; flush privileges; Copy the code
-
Enable firewall port 3306 and exit mysql
Port 3306 release/sbin/iptables -i INPUT -p TCP -- dport 3306 - j ACCEPT add this setting to the firewall rules in the/etc/rc. D/init. D/iptables saveCopy the code
Mysql > connect to mysql;
-
Set mysql services to start with system startup
Add mysql to system service: chkconfig --add mysql automatically starts: chkconfig mysql onCopy the code
2.3 Installing Tomcat in Linux
-
Create a tomcat directory in /usr/local/ : mkdir tomcat
-
Copy tomcat to /usr/local/tomcat: cp /root/apache-tomcat-7.0.52.tar.gz. /tomcat/
-
Decompress tomcat: tar -xvf apache-tomcat-7.0.52.tar.gz
-
Start Tomcat and enter bin:
-
Method 1: sh startup.sh
-
Method 2:./startup.sh
Note:
View the log file tail -f logs/ CATALina. out Exit CTRL + CCopy the code
-
-
Open port 8080:
Port 8080 release/sbin/iptables -i INPUT -p TCP -- dport 8080 - j ACCEPT add this setting to the firewall rules in the/etc/rc. D/init. D/iptables saveCopy the code
2.4 Release projects to Linux
-
Databases and tables
① Back up store28 database in Windows: Mysqldump -uroot -p1234 store28 > C:/1. SQL ② Upload the 1. SQL from disk C to the root directory in Linux ③ Restore the database using a remote tool. Log in to the mysql database and create the store28 database source /root/1.sqlCopy the code
-
project
War war package features: War is automatically decompressed when Tomcat is started in the tomcat/webapps directory. Upload store.war to the root directory in Linux and move store.war to the Tomcat /webapps directoryCopy the code
Third, Nginx
3.1 Concepts related to NginX
Before introducing Ngnix, we will review several concepts related to Ngnix.
1. Reverse proxy
In Reverse Proxy mode, a Proxy server receives Internet connection requests, forwards the requests to the Intranet server, and returns the results to the Internet client. In this case, the proxy server behaves as a server.
2. Load balancing
Load Balance, or Load Balance, is based on the existing network structure and provides a cheap, effective and transparent method to expand the bandwidth of network devices and servers, increase the throughput, enhance the network data processing capability, and improve the flexibility and availability of the network. The principle is that the data traffic is distributed among multiple servers, reducing the pressure of each server, and multiple servers jointly complete the work task, thus improving the data throughput.
3.2 introduction of nginx
1. What is NgniX?
Nginx(Engine X) is a high-performance HTTP and reverse proxy service, as well as an IMAP/POP3/SMTP service. Nginx was developed by Igor Sesoyev for the second most visited rambler.ru site in Russia (р а блер), the first public version 0.1.0 was released on 4 October 2004.
It distributes source code under a BSD-like license and is known for its stability, rich feature set, sample configuration files, and low consumption of system resources. On June 1, 2011, Nginx 1.0.4 was released.
Nginx is a lightweight reverse proxy server and email (IMAP/POP3) proxy server distributed under a BSD-like protocol. Nginx is characterized by less memory and strong concurrency. In fact, nginx’s concurrency does perform better in the same type of web server. In Mainland China, nginx website users are: Baidu, JINGdong, Sina, netease, Tencent, Tao, etc. (From Baidu Baike)
2. Why use NgniX?
Background:
With the rapid development of the Internet today, large number of users and high concurrency have become the main body of the Internet. How do you make a website capable of hosting tens of thousands or hundreds of thousands of users? This is some small and medium-sized websites need to solve the problem. The website built with stand-alone Tomcat can withstand concurrent visits of about 150 to 200 under ideal conditions. With concurrent visits accounting for 5% to 10% of the total number of users, the single-point Tomcat site has around 1500 to 4000 users. It is obviously not enough for a website that provides services for the whole country. In order to solve this problem, load balancing method is introduced. Load balancing is a problem that cannot be solved by a single Web server. Multiple Web servers can solve the problem by sharing the load equally. The incoming requests are evenly distributed among multiple backend Web servers, so the load is broken down.
There are two types of load balancing servers. One is a hardware load balancing server, for example, F5. The other is software load balancing, or soft load, such as Apache and Nginx. Compared with the soft load, the hard load functions at more network layers and can function at the data link layer of the socket interface for packet forwarding of requests. However, the price is expensive. The layer of soft load is above the HTTP protocol layer, which can forward HTTP requests in groups. Because it is open source, the cost is almost zero. Moreover, e-commerce websites such as Alibaba and Jingdong all use Nginx servers.
Many large sites are using Nginx as a reverse proxy, the application is very widespread. Nginx is a high-performance HTTP/reverse proxy server and E-mail (IMAP/POP3) proxy server. Developed by Russian programmer Igor Sysoev, Nginx was officially tested to support up to 50,000 concurrent links with very low CPU, memory and other resource consumption.
Application Scenarios:
-
HTTP server, can do static web page HTTP server.
-
Configure a VM. A domain name can be bound to multiple IP addresses. Requests can be forwarded to servers running on different ports depending on the domain name.
-
Reverse proxy, load balancer. Forwarding requests to different servers.
4. Build a cluster
4.1 Setting up a Cluster in Windows
1. Create two new directories tomcat1 and tomcat2 on disk G
2. Change the port number of Tomcat2. For example, change the port number of Tomcat1 to +10
-
Access port 8080: localhost:8080/test/
-
Access port 8090: localhost:8090/test/
Unzip nginx
Modify the nginx.conf file of ngnix, add the reverse proxy server under location /, proxy port 8080:
This is just one server. Localhost /test/ is used to access port 8080.
4. Proxy cluster, such as proxy two servers
Add an upstream servlet_yujia{server 127.0.0.1:8080; Server 127.0.0.1:8090; } Change the reverse proxy proxy_pass http://servlet_yujia in location /Copy the code
Modify the following figure:
Alternatively, you can add weights to the server:
Indicates that if there are six accesses, the second server can be accessed four times. Localhost /test/ can be used to access port 8080 or port 8090.
5. Session sharing problems
-
Solution 1 (works only on Windows) :
Web server solution (broadcast mechanism), note: Tomcat performance is low
Modify two places:
-
Modify two tomcat server XML, support sharing, the engine under the label of “Cluster className =” org. Apache. Catalina. Ha. TCP. SimpleTcpCluster “/ > comments removed
-
Modify the project configuration file web.xml to add a node
-
-
Solution 2: You can add the session ID to redis
-
(10) Add ip_hash (Linux) to upstream (10) to ensure that the same IP address always accesses the same web server.
The ip_hash command can locate requests from a client IP address to the same back-end server using the hash algorithm.
4.2 Setting up a Cluster in Linux
1. Upload nginx to Linux
Unzip nginx
Compile nginx first
Yum install GCC -c++ yum install -y pcre pcre-devel yum install -y zlib zlib-devel yum install -y openssl Openssl-devel: Openssl-devel: openssl-devel: openssl-devel: Openssl-devel: Openssl-devel: Openssl-devel: Openssl-develCopy the code
4. Install nginx
- perform
make
- perform
make install
Start nginx:
CD nginx configuration file conf Start nginx./nginxCopy the code
Note: ① In the nginx directory there is a sbin directory, sbin directory has an nginx executable program
② Close ngniX:
- Close the command: equivalent to finding the Nginx process kill
./nginx -s stop
- Exit the command
./nginx -s quit
After the program is executed, close it. You are advised to run this command.
6. Release port 80
/ sbin/iptables -i INPUT -p TCP -- dport 80 -j ACCEPT add this setting to the firewall rules in the/etc/rc. D/init. D/iptables saveCopy the code
7. Modify the CONF file as in Windows: Configure the cluster