A brief introduction to HTTP URLS

background

It has been more than a year since I graduated and started working. In the past year, I found that some background programmers did not even know the query parameters of URL. In addition, I also encountered some problems when using Node and Python to connect encrypted mongodb recently, so I decided to make a summary of URL knowledge I know.

What is the URL

Noun explanation

URL is a Uniform Resource Location abbreviation. In layman’s terms, a URL is a string used to describe information resources on the Internet, mainly used in various WWW client and server programs. Using URLS can use a unified format to describe various information resources, including files, server addresses and directories.

Also mention urIs:

Every Resource available on the Web – HTML documents, images, video clips, programs, and so on – is located by a Universal Resource Identifier (URI).

A URI generally consists of three parts:

Naming mechanism for accessing resources.
Name of the host where the resource is stored.
The name of the resource itself, represented by a path.

Urls are a subset of URIs, but in normal development we only need to know about urls.

URL format

In http://test.com:8080/example/index.html, for example.

The URL format consists of the following three parts:

The first part is the protocol (or service mode), which in this case isThe HTTP protocol.
The second part is the IP address or domain name (and sometimes port number) of the host holding the resource, in this casetest.com:8080. Usually, only the domain name is displayed. Then, the client queries the IP address of the domain name using the DNS (Domain name System) and connects to the server based on the IP address and port number. More on this later.
The third part is the specific address of the host resource, such as the directory and file name, in this example/example/index.html.

The grammar of the URL

Universal grammar

<scheme>://<user>:<password>@<host>:<port>/<path>; <params>? <query>#<frag>

Scheme: protocol, common HTTP (80), HTTPS (443), mailto, FTP (21), RTSP, RTSPu and file.
User: indicates the user name.
Password: indicates the password.
Host: indicates a host.
Port: indicates a port.
Params: parameters. Usually in the form ofkey=value.
Query: Query parameter or query string.
Frag: fragment (which is parsed in the browser aswindow.location.hash).

This is just general syntax, most urls follow only part of it, and not every URL has all of the above information.

At least :/@; ? # are sensitive characters, so they cannot be included in other parameters. If they contain sensitive characters or special characters, you need to use the corresponding escape characters, otherwise unexpected results may occur. In the development process, it is easy to contain sensitive characters in the query parameters, so the value of the query string generally needs to use encodeURIComponent to encode.

Syntax in HTTP (s)

http://test.com:8080/user/index.html?id=1&nickName=test#/list

The parsed format in the browser is:

{
    protocol: 'http:'./ / agreement
    host: 'test.com:8080'.// Host name or domain name with port number
    hostname: 'test.com'.// Host name or domain name, without port number
    port: ' '.// Port number, HTTP default 80, HTTPS default 443
    path: '/user/index.html'./ / path
    query: '? id=1&nickName=test'.// Query string
    hash: '#list'.// fragment or hash,
}
Copy the code

Note: # nothing that follows is passed from the client to the server.

Changing the string after # in the address bar does not refresh the page, but raises a Hashchange event, which is how many front-end routing hash modes are implemented. Changing anything other than a FRAg (Hash) will cause the browser to refresh, since a new request has been made to the server.

Syntax in FTP

File transfer protocol, which can be used to download or upload files from the server.

Basic format:

ftp://<user>:<password>@<host>:<port>/<path>; <params>

Example:

ftp://ftpuser:[email protected]:21/path/example

Syntax in file

This protocol is most commonly used when local files are opened in a browser, perhaps on a network file system or some other file sharing system. I won’t go into detail on that, I feel like there’s nothing to say.

Mongo agreement

This is common when a daemon connects to a mongodb database. Although common packages connect to databases in object format, they are eventually converted to strings.

Basic format:

mongodb://<user>:<password>@<host>:<port>/<path>? <query>

Example:

Mongo: / / test: 123456 @127.0.0.1:27017 / will be? AuthSource =admin This character string indicates that the mongodb protocol is used, the user name is test, the password is 123456, the database name is novel, the authenticated database is admin, and the connection is to port 27017 of host 127.0.0.1.

Note: If encryption is not enabled on the mongodb server, delete the query parameter. Otherwise, the connection fails. If encryption is enabled, the authSource value must correspond to the user and password in front. Otherwise, the user authentication page cannot connect. When I started using ThinkJS, I didn’t configure the authSource at the beginning. The error was that the connection timed out. At that time, I was wondering how slow the remote server could be. Later, I adjusted the timeout time and found that it still timed out, and I kept trying to reconnect. I thought that maybe the authentication failed, and this tangle took a long time.

The entire process that is executed after entering the URL in the browser

The whole process is as follows:

Domain name resolution;
Initiate TCP three-way handshake;
After a TCP connection is established, an HTTP request is sent.
The server responds to HTP requests;
The browser parses the HTM code and requests resources in the HTML code (such as JS, CSS, images, etc.);
Disconnect the TCP connection.
The browser renders the page to the user.

In fact, the domain name resolution process is a bit complicated if detailed, in short, sometimes also quite time-consuming, after all, from the resolution of the word we can see that it must take time? DNS resolves the domain name to an IP address, and then connects to the server based on the IP address and port number.

If we use IP to access the server directly, we can save some time, but it is not recommended to do this, because if you change servers, the domain name can be resolved to another IP, and the corresponding data can be retained in the browser, but the IP does not. Also, domain names are much more accessible and readable than IP.

Dynamic and static servers

Most of the above applies only to static servers. Dynamic servers have their own set of resolution rules, but they are basically the same. The biggest difference is probably dynamic routing for dynamic servers (front-end routing now also supports dynamic routing).

The following is mainly about the dynamic server relative to the static server’s special point:

Dynamic routing

For example, to define an interface to get information about a person, its path is {path: ‘/user/:id’}

When accessing/user / 1? When name=test, the interface will be resolved by the framework as:

{
    params: {
        id: 1,
    },
    query: {
        name: 'test',
    }
}
Copy the code

For static routes, only /user? Id =1&name=test dynamic routing has some advantages over static routing. And dynamic routing looks more elegant.

However, if the background does not define the route properly and the parameters sent by the front end are empty, the dynamic route will become /user/ and the interface accessed is /user/ instead of /user/: ID. The background fails to find the route and returns a 404 or custom error.

Routing to rewrite

The background frameworks I have come into contact with are ThinkPHP and ThinkJS, which are quite similar. They both have a feature that allows route rewriting, such as adding a suffix after a defined route. If ext is set to: /user/1.html will be parsed as /user/:id. In companies where the front and back ends are not yet separated, this should be the main output method of the page. .

So after see http://test.com/user/1.html in the URL, can no longer simply think it must point to a static file server, it can also be a response after template rendering.

Say a point of dynamic server, then say a point of static server, because I met a background, upload files in the front end, directly store the file to the root directory of the Linux system, and then in the domain name after the file path spelled up, but also wonder how is not access?

The root directory of the Web server

Commonly used static server is Nginx, set up static file compression, cache, proxy, identify the device to jump, picture clipping and so on, are so easy, is simply our big front-end standard with it.

Popular web servers are mainly Nginx, Apache, tomcat, the latter two are mainly used to match the background, do not directly expose the interface. Each of these static servers has a configuration to set the root directory of the Web server, so what does this root directory do? Controls the top-level directory that the client can access. For example, the root directory is WWW, is not able to access other files outside the WWW directory, can only access the WWW subdirectory of each file.

Refer to the link

Nouns explain urls and URIs
The Definitive GUIDE to HTTP
The entire process that is executed after entering the URL in the browser