To understand what a reverse proxy is, you naturally need to know what a forward proxy is.

In addition, reverse proxies are usually referred to as HTTP reverse proxies, but the scope of reverse proxies can be much larger, such as TCP reverse proxies. We are not going to talk about TCP reverse proxies here, but when we talk about reverse proxies, we are referring to HTTP reverse proxies.

A forward proxy is usually referred to as a proxy. It is not necessary to emphasize that it is a forward proxy. In HTTP, a proxy refers to a forward proxy.

Direct access to the

To talk about forward proxies, we need to talk about the “direct access” form.

That is, the pattern without any proxy.

In fact, direct access is the most common method for many small sites. Direct access to our daily shopping analogy is similar to “direct from the manufacturer”. You order directly from the manufacturer without going through any middlemen.

From a system perspective, “direct access” means that the browser’s request goes directly to the server that ultimately generates the web page, without passing through any HTTP proxy servers. What about proxies or, more proliquiously, “forward proxies”?

Forward Proxy

Again, using the shopping analogy, you buy a product from a store rather than directly from the manufacturer, which is similar to the agency model.

For example, if you buy a box of instant noodles from the shop, obviously, you know very well that the shop itself does not produce instant noodles, it is just a “middleman”, to make some money from you, and the instant noodles in the shop are also purchased from the manufacturer. Of course, some shops pull out a little ruthless, that will become disgusting in the “profiteer”.

A proxy, or rather a proxy server, plays a similar role in the browser’s request processing. It is simply a request intermediary.

A forward proxy server does not have the ability to respond directly to a request, just as a store does not produce instant noodles. It simply forwards the request to the final web server, which forwards the response to the requestor, the browser, as shown below:

But there’s a problem here, you know what little shops you have around your house, and you can go to one of these agents if you want to buy something, and the question is how does the browser know where the agent is?

For example, if you enter my domain name “xiaogd.net”, the DNS system browser will find the corresponding IP address is 118.89.55.54. How can the browser know where the proxy server exists and whether the request should go through the proxy server?

The answer is that you actively tell the browser, a process commonly known as “configuring a proxy server.”

As you will see later, this is a significant difference between forward and reverse proxies.

Here is a schematic of configuring a proxy server on Internet Explorer:

Why are agents enabled?

Naturally, one might ask, doesn’t direct access smell good? Why bother to go through the proxy server to change hands? The reasons can be as follows.

One is for security audit and control of some considerations. In some organizations, web-related ports such as 80 and 443 are blocked, so you cannot access the Internet at all. If you want to access the Internet, you can only configure an Intranet proxy server designated by the organization.

Of course, the proxy server itself is not restricted, it can access the external network.

This way, all your Internet requests go through a proxy server, which is controlled by the organization and can audit the requests:

  • For example, we will stop you from uploading confidential information about our organization to an outside website.
  • Or you discover that you’ve visited an insecure website that might be poisoning your computer and block it.
  • Or they find that you are visiting entertainment websites that have nothing to do with your work, so they block you. (The organization has to worry about your performance and kpIs.)

Other reasons are to speed things up or save bandwidth. Because some proxy servers can not only forward, but also cache web pages and other resources.

For example, when I was in school, I was told that proxy server could be configured for Internet access in dormitory. I guess the reason may be that the school’s overall external bandwidth is limited.

For example, if a lot of students want to go to the qq.com home page now, then the first student request, the proxy server can cache the home page for a period of time, encountered later students want to visit the home page, there is no need to request, the proxy server directly return the cache request.

Naturally, the cache also has an expiration period, so it doesn’t go on forever, otherwise it doesn’t get updated.

How often and how to update the cache is a matter of specific cache policy.

And of course, now a lot of web page have personalized recommendation, or directly is to log in, it usually will not be able to cache, so now configure proxy server’s behavior also is not so popular now, of course, may also now also increases the bandwidth, and I did not understand a lot of people don’t want to know how to configure the proxy server. On the other hand, many static resources can be cached, such as images, JS, CSS files, etc., so proxy servers can still be used.

Finally say another reason, because the country has decided, some foreign technology website is not to visit, and we want to go up to look up the data to solve the bug in the hands, then need some scientific means, direct access is not good, it must be through the agent to “winding path tongyou”.

Strictly speaking, many of these proxies are more generalized than HTTP proxies, but the principles are similar and are a manifestation of the proxy pattern. With the help of our configuration or some smart plugin, browsers know that direct requests to certain sites will fall into a black hole, so they need to “broker” those requests to bypass the firewall. For simple proxy configuration, the configuration is a proxy server address, but that there is a problem, that is all request agent, there are some agents in advanced plug-in also allows you to configure the specific rules, which means you can configure which addresses to go agent, which do not walk agent again, often bring some predefined rules, All kinds of whitelist, blacklist, you can also add your own new rules.

In short, the proxy is an intermediary role that indirectly accesses the required resources, and the browser is aware of this role because you need to proactively configure and enable it for the browser. So this is agency, or forward agency.

Reverse Proxy

Now that we know about direct access and the so-called forward proxy, we can talk about reverse proxy.

A big difference between reverse proxy and forward proxy is that it does not require the client (browser) to do any configuration, and there is no proxy server configuration.

If the forward agent is active configuration, the active agent, then the reverse proxy is “be agent”, from this point of view, the reverse proxy is sometimes referred to as the “transparent proxy”, that is all don’t know that he was the proxy and browser thought to its response is the ultimate web server, but was in fact a “agent”.

Let’s take shopping as an example. Sometimes when you shop online, you’ll see a vendor claiming to be the manufacturer, that everything is cheap, that it’s a direct manufacturer, so you order it. After some time, you find another shop claiming that it is the real factory direct sales, and then you carefully look at the information of the two shops, only to find that the first shop is fake, it is not the real factory.

But why is it still so cheap to sell from a fake manufacturer? So that the price with the real factory direct no difference. Reason may is the store is located next to the factory directly, and then he could sort of relationship with manufacturer, know some people like inside, he can take the arrival of the goods from the manufacturer with cheap price, and because of the proximity, almost no logistics cost, from a certain level, it is not how bogus claims that the manufacturers selling. Strictly speaking, of course, it belongs to the fake factory direct sales, he is still an agent

It claims to be Li Kui, but in fact it is Li GUI.

Here’s a picture to compare the two situations:

Then such a model is a bit of a reverse agent, you think you bought direct, in fact, you are “agent”, or through the middleman.

But the middleman is not so obvious to you, or even transparent to you, leaving you in the dark.

Although they are “agents “, this is very different from offline store purchases. When you go to the store to buy offline, you are very aware that you have passed through the agent middleman, that is, the store itself, but in the remote online situation that claims to be the manufacturer’s direct sales, sometimes you really can’t tell whether you are agents.

The same is true for HTTP reverse proxies. For example, if you visit my website Xiaogd.net and look at the server information in the home page request, it tells you that the server responding to the home page request is a Nginx server, as shown in the picture below:

The question is is Nginx the server that ultimately generates this page? It’s not! If you know anything about Nginx, you know that it’s usually just a static resource server, and my home page is a dynamically generated content. In fact, if you look at a statement at the bottom of my site, it looks something like this:

You’ll see that this home page is actually generated by a PHP site-building application called wordpress. Inside my cloud host, Nginx actually forwards home page requests to a so-called PHP-FPM gateway

The PHP-FPM gateway is basically a WEB server for PHP, except that it technically doesn’t use HTTP, but an internally simplified FastCGI protocol.

This is a reverse proxy pattern if you want to get serious, but it’s not all HTTP reverse proxy, but externally it is.

Get the content of the final response from it and forward it to the browser again, as shown in the following diagram:

This is a case of internal configuration:

location ~ \.php$ {
    root           /ftp/wwwroot;
    fastcgi_pass   127.0. 01.: 9000;
    fastcgi_index  index.php;
    fastcgi_param  SCRIPT_FILENAME  $document_root/$fastcgi_script_name;
    include        fastcgi_params;
}
Copy the code

The request is forwarded to an internal PHP application server listening on port 9000.

From an external browser’s point of view, the request goes directly to Nginx Server and the response comes back from Nginx Server without any (forward) proxy. As for how your internal request is forwarded, obviously the browser has no way of knowing and does not need to know.

From the architect’s point of view, of course, Nginx does not have the ability to respond to many requests. It simply proxies them internally to another internal PHP application server, which is the ultimate response generator.

The role of Nginx is to act as a “reverse proxy” server. The browser is proxied, but it has no way of knowing whether it is proxied or not. It is transparent to Nginx because it does not actively proxied itself.

Of course, now that you know my internal configuration, if you access xiaogd.net:9000 directly, then it is true “direct access”, then bypassing Nginx.

However, direct access is unavailable because the 9000 port is not open to the public. But it is accessible internally, for example, try using wget:

wget localhost:9000
Copy the code

This is true “direct access”, with no proxy, neither forward nor reverse.

It is important to note that using wget to get a response is still an error, because WGET uses HTTP. The CGI gateway of PHP actually uses FastCGI, which is a simpler protocol than HTTP and more efficient for internal communication. Wget doesn’t support this protocol, but Nginx understands it, and the process goes like this:

browser — [http] –> Nginx — [fastcgi] –> php-fpm

Strictly speaking, the internal reverse proxy actually uses the FastCGI gateway protocol, but the principle is the same. If you use an internal proxy such as Tomcat to respond, then the entire HTTP protocol can be used.

browser — [http] –> Nginx — [http] –> tomcat

If an internal request 80, such as wget localhost, is sent back to Nginx, which listens on port 80. Nginx forwards the request to PHP-FPM.

Also: For more information about ports and default ports, please refer to this article for further understanding ports.

Why use a reverse proxy?

So at this point we are faced with a new question, that is, why the whole reverse proxy? Doesn’t direct access smell good, similar to heckling when confronted with a forward proxy? Why use this reverse proxy? Some of the reasons for the forward proxy have been explained before, and the reverse proxy appears, just as there is no love and hate without reason in this world, naturally there are also reasons for its existence.

One straightforward reason is the use of reverse proxies as a means of internal load balancing.

For example, if I develop a Java Web application as the backend of my website, I deploy it directly to the Tomcat server and let Tomcat listen on port 80 for external services. There was not much traffic at the beginning, so there was no problem, as shown below:

[xiaogd.net] [xiaogd.net] [xiaogd.net] [xiaogd.net] [xiaogd.net] [xiaogd.net] Xiaogd.net :80, for the topic of default ports, you can still refer to the above mentioned in-depth understanding of ports.

But after a while, the traffic may come up and a Tomcat process can’t handle it, so what happens? I decided to start a new Tomcat process, but then I had a problem. There was only one port 80, which was already occupied by the first Tomcat process. If I wanted to start another tomcat process, I would have to use another port, such as 8080.

[xiaogd.net:8080] [xiaogd.net:8080] [xiaogd.net:8080] Obviously, there is a problem with this scheme. The user does not know the service exists on port 8080. Even if you can tell the user, the user may not understand, and the user is also very afraid of the trouble.

In addition, even if some users are willing to switch to port 8080 as you said, you still have no good control over splitting the traffic evenly between the two Tomcat servers. After all, this is a random decision by users, and many users may suddenly flock to port 8080 applications, causing congestion.

Or if only a few users are willing to follow your advice and switch to the new port 8080, the access is still concentrated on the old port 80, so that the old application is still slow to respond, and the new application is idle and underused because few users are accessing it.

So, in this case, the reverse proxy benefits are reflected, the specific operation is like this, let Nginx as a front reverse proxy, listening on port 80; The first Tomcat goes behind the scenes, and instead of listening on port 80 (which needs to be left to Nginx), it listens on another unused port, such as 8081, and lets Nginx forward requests to it for processing.

Of course, if there was only one Tomcat, the configuration would look something like this:

location / {
    proxy_pass   http://127.0.0.1:8080;
}
Copy the code

The request processing flow looks like this:

Request: browser — [HTTP] –> Nginx — [HTTP] –> tomcat

Response: browser <– [HTTP] — Nginx <– [HTTP] — tomcat

Naturally, the reverse proxy seems less necessary in this case, adding an extra link that makes the response time slow.

However, if there are two Tomcat servers, the situation is different. In this case, you can enable the load balancing policy at the level of Nginx reverse proxy. The configuration is as follows:

http {
    upstream myapp1 {
        server 127.0. 01.: 8080;
        server 127.0. 01.: 8081;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://myapp1;}}}Copy the code

If many requests come in at the same time, Nginx will send half of the requests to Tomcat on port 8080 and the other half to Tomcat on port 8081, as shown in the following figure:

Nginx handles all requests. Users do not need to make a choice or know that the application exists on port 8080 or 8081. They should continue to access the original url xiaogd.net without any changes.

If you have several hosts in the cloud, you can even form an Intranet and deploy Tomcat on different hosts. For example, if there are three hosts, one is running Nginx to listen on port 80, and the other two are running Tomcat to listen on port 8080 and 8081, respectively, and to accept and process requests from the Nginx reverse proxy, as shown in the following figure:

If the two Tomcat hosts are configured differently, for example, one is more powerful, you can also adjust the load ratio (weight, weight) so that the more powerful one can handle more requests:

http {
    upstream myapp1 {
        server 192.168. 020.: 8080 weight=3;
        server 192.168. 021.: 8080 weight=2;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://myapp1;}}}Copy the code

The weight ratio of 3:2 is configured as above, so that one machine undertakes 60% of the requests and the other machine with poor performance undertakes 40%. That is, for every five requests, three will be forwarded to the host with IP address 20 and two will be forwarded to the host with IP address 21.

Naturally, one might wonder, all requests still go through Nginx, can it handle it? The answer is yes, because its function is only to forward, which is a bit like Meituan Takeout. Although it receives tens of thousands of people’s orders every day, it does not need to buy, wash, cut or cook dishes by itself. It only needs to give orders to restaurants and then deliver their prepared dishes. That means the time-consuming process of cooking is left to the restaurant.

In this reverse proxy mode, the responsibility of generating web pages is also handed over to Tomcat, which is hidden behind. Generating a complex dynamic web page may require some complex calculations, querying the database, and piecing together various page components, which may be time-consuming. However, these requests are processed concurrently by both Tomcat applications, so the response speed is still guaranteed, and these are the benefits of reverse proxy.

conclusion

This concludes the presentation of direct access, (forward) proxy and reverse proxy, and concludes the three scenarios and their comparison with the shopping example.

In the case of direct access, the browser directly accesses the server that ultimately generates the response, similar to the way we shop from the manufacturer directly, as shown in the figure below:

In the (forward) proxy scenario, the browser actively accesses the proxy server and indirectly obtains the final response from it, similar to shopping from a store whose goods are purchased from a manufacturer, as shown in the following figure:

In the case of reverse proxy, it is similar to direct access from the browser’s point of view, but its request is transparently brokered on the server side. Similar to the way we shop online from a “fake manufacturer” who claims to be a direct seller, the fake manufacturer actually redirects our orders to the real manufacturer and takes the goods from it to us, but we have no way of knowing what’s going on behind the scenes, as shown below:

On a complex network, a browser request may also be proxied forward and then backward, as shown in the following figure:

So much for HTTP forward and reverse proxies.