It is time to share the URI in network protocol again every Tuesday. Today, I will share the related content of URI in network protocol. Because I am on a business trip, I cannot make the head diagram, so I will use the old diagram.

I think we all know what a URL is, because we’re all exposed to it every day, but what is a URI?

So let’s look at what would happen if there were no URIs in the world.

I uploaded some material to share with you without the URI. How can you download it?

First, I have to tell you to use FTP to access Naonao.com, port 8090

Then, tell you that the login user name is Naonao and the password is Handsome

Once logged in, you need to go to /Naonao/Source and convert to binary mode

Finally download again how to avoid the annoyance caused by handsome. Mp4 files

If it weren’t for the fact that the final document was too attractive to me, I wouldn’t want to bother with such a troublesome step

But after a URI, these steps above, we only need to input directly in the browser ftp://Naonao:[email protected]:8090/Naonao/Source/ how to avoid too handsome cause troubles. Mp4 so that you can directly download the resources on the Internet

As a friendly reminder, the URI above is not a completely correct URI, because the Chinese in the last part is not transcoded. We will look at the question of transcoding and decoding later

What is a URI?

What problem does urIs solve

But before we get to URIs, we need to take a quick look at urls and UrNs

A URL is defined in RFC1738(1994.12) as a Uniform Resource Locator, which indicates the location of a Resource and is expected to provide a method to locate the Resource

Urns, defined in RFC2141(1997.5) as Uniform Resource names, are expected to provide persistent, location-independent identification of resources and allow simple mapping of multiple namespaces to a single URN namespace

To get straight to the concept of URN, some people may not know what it is, but I’ll give you an example that guys definitely do, like magnetic links.

I found a calabash baby magnetic download address on the Internet, you look around, do you think this thing is very familiar?

magnet:? xt=urn:btih:bdab9b6759950fab3c8cbde2669bea6195491034

Well, it’s okay if you’re not familiar with it. That’s not the point of today. We know it’s probably the Great Wall. Now, what is the definition of a URI

The full name of a URI is a Uniform Resource Identifier, which is used to distinguish resources. It contains urls and UrNs, which are used to replace them

In other words, A URI can be a URL/URN, but a URL/URN need not be a URI, that is, a URI is a superset of A URL/URN

The difference between URIs and urls

We now know that URIs are supersets of URLS, but on the web, urls and URIs look so much alike that we often confuse them

The difference between URIs and urls is identifiers and Locator. Uris focus on unique identifiers, while urls focus on location

To make a simple analogy, if we use URI to express ourselves, then URI is our ID number, URL is our home address on our ID card, through the ID number (URI) can certainly find me, but you can not necessarily find me through my address (URL)

What does the resource include

The word “resources” covers so many things, from pictures and documents to today’s weather

It can also be an entity that cannot be accessed through the Internet, such as a person or company

It could be something abstract, like kinship or whether you’re a man who cheats on women’s feelings

However, it should be noted that URIs do not correspond to resources one by one. A resource can have many URIs, but one URI only corresponds to one resource, just like we have many bank cards in hand, but each bank card corresponds to an account holder only by ourselves

The practical use of a Identifier is a name that distinguishes the current resource from other resources

From the meanings of identifiers and sources, one of the goals of the URI is clear: it is more likely that resource providers will differentiate their own resources from other resources

For example, for entities that cannot be accessed through the Internet, such as people, we can define Mine, Father, Relationship, etc., through URL. In this way, we can distinguish the resources we want to express

The composition of the URI

Let’s take a look at the components of a URI

We analyze it based on what’s on the picture

Let’s look at the three most important things first, take an example, and then look at the following illustration

https://naonao.com?name=naonao&age=18#page-7

Scheme

Scheme refers to a Scheme, such as HTTP, HTTPS, FTP, etc., that can be used. Don’t be limited by these common protocols. You can also customize the protocol as long as the server supports it

Scheme can be letters, numbers, +, -, and., all of which are allowed

Note: After Scheme, you must distinguish Scheme from the rest with ://

Query

Query is an optional query parameter. If there is one, it must start with? At the beginning

The most common form is to use key=value, as in the above example name=naonao

But Query does not only support this, it also supports pchar,/,? The form such as

? If you want to use the Query parameter, you have to say? And what is pchar? If we want to understand this, we need to refer to the detailed description in the RFC, which is not the focus of today’s lecture

fragment

Fragment is also optional, and must start with a # if any

As in the example above, Page-7 points to a paragraph

It supports the same format as Query supports

authority

The authority contains the user name and password, the host name, and the port number.

For things like usernames and passwords, we don’t really use them this way anymore, because it’s not secure to transmit them in plain text in urIs

It’s still in use today, basically when we’re using FTP to download resources a lot

So we usually just use host:port, which is a host name plus a port number

The host name cannot be omitted, because if omitted, we cannot find the corresponding server

For example, the default HTTP port number is port 80, and the default HTTPS port number is port 443

path

The host name is immediately followed by our path

In urIs, the path part must start with a slash, so don’t mistake the slash before path for the end of the previous authority

There are many types of paths, including path-abempty, path-absolute, path-noscheme, path-rootless, and path-empty

  • The path – abempty to/An initial path or an empty path
  • The path – absolute to/Start, but not with//At the beginning
  • The path – noscheme with non:Path starting with a
  • The path – rootless relativelypath=noscheme, increase permission to:Path starting with a
  • Path – the empty empty path

There are so many paths just to respect the document, but even though there are so many types, it is actually very simple to use. Combining the five paths above, we can find that the limitation is only the beginning character

As long as we do not use Chinese or other special characters as the beginning of the path, so that our path is legal

So the path, we just need to fill in according to the actual situation

URI encoding

Finally, it’s time to fill the hole

We started with an example of how to download resources if there are no URIs in the world. The example I gave was urIs with Chinese characters, but in fact only ASCII characters can be used in URIs

But if we have something other than ASCII in our URI, or if we have identifiers in our URI like what? ‘#’/’ & ‘, and so on, will cause a URI parsing error.

To avoid this, URIs introduce encoding mechanisms

The rules are very simple, special characters in the ASCII code table are directly converted to ASCII code

Anything other than ASCII is converted to a hexadecimal byte, followed by a %, such as a space, which is escaped to %20,? Is translated to %3F

For example, nao is escaped to %e9%97%b9%e9%97%b9. For example, nao is escaped to %e9%97%b9

Because the corresponding hexadecimal UTF-8 encoding is E9 97 B9, and each bytecode is preceded by % to get the above result

Usually we input the URI in the browser address bar, even if the input Chinese can also be used normally, in fact, the browser behind us to help us do the helpless pain of transcoding and decoding

This is actually a very user friendly experience, not something you can’t read directly to the user, and is also a concept worth learning

Write in the last

URI is a content that must be understood in network protocol learning, but in fact, it is not difficult in general, just a little bit more conceptual things, after understanding, in fact, a little bit of content

You may ask, what’s the use of learning this? I can only answer you that it is of no direct use, but indirect use

For example, to do background development, to connect the interface, if the URI given is not standard, then the interface caller can not locate our resources, finally facing Google programming for a long time to solve

Or do front-end development, the interface call is not standard, such as GET call query parameter write error, then naturally can not adjust the interface given by the background

Another example is that when you get an unfamiliar project, you can analyze which resources are used and which pages and interfaces are relied on through the Network of the browser, but you can’t even understand the URI. Then you have to ask your colleagues, and you will be scolded by others after asking

Although these problems can be solved by Google programming or by asking colleagues, it is our time and colleagues’ time that is wasted when we look up information or consult