This is the third day of my participation in the August Text Challenge.More challenges in August
π’π’π’url is the standardized name of the Internet resource network. A URL points to a piece of electronic information, tells you where it is and how to interact with it — the Definitive GUIDE to HTTP
π 1. URL and URI
Speaking of urls, we have to mention URIs. Uris are a more general class of resource identifiers, of which urls are actually a subset. Uris contain urls, which identify resources by location, and UrNs, which identify resources by name. So the relationship between these three is similar to my id number, home address and name on our provincial certificate.
Although the norm is for URIs to act as uniform Resource Identifiers (URIs), only urls are handled in HTTP, so let’s take a look at familiar urls
π 2. Composition of URL
We surf the Internet every day, almost every day will open dozens of web pages, these pages will have a unique URL address. We can use this address to find the web page, so what information does this address contain? How does it relate to locations in the browser, client, server, and server file system π€ΈβοΈ for example π° let’s look at π₯΄
www.bookstack.cn/read/html-t…
- The first part of the URL (HTTPS) is the SCHEME of the URL. Scenarios can tell Web clients how to access resources. In this example, the URL says to use the HTTPS protocol.
- The second part of the URL (www.bookstack.cn) refers to the location of the server. This section tells the Web client where the resource is located.
- The third part of the URL (/read/html-tutorial/docs-url.md) is the resource path. The documentation indicates which specific local resource on the server is being requested.
In fact, the most important components of URL are the three parts scheme, host and path. But urls are actually made up of nine parts, so let’s look at the table below
component | describe |
---|---|
scheme | Method describes the protocol used to request the resource, separated from the rest of the URL by a “:”; |
password | Password Indicates the password that may be followed by the user name. Separate the password from the user name with colons (:). |
host | The host describes the host name or IP address of the website. If the username and password are used, separate them with @. |
post | The port that the server is listening on. The default value is 80 for HTTP and 443 for HTTPS. |
path | Lu Jin describes the location of resources on the server, separated by a ‘/’ from the preceding section; |
params | Parameters describe the additional parameters required by the request, using “; “. Separate from other parts; |
query | Queries are used to activate the server program to perform certain operations, such as querying a database, with “? Separate from the rest; |
frag | Fragments are only used on the client side and are not sent to the server side |
So let’s take a look at some of the components that we actually use for development
π 2.1 solution
The schema is actually the main identifier that defines how to access a given resource. It is responsible for what protocol the APPLICATION that parses the URL should use. The most common ones are HTTP and HTTPS. There are several schemes as follows
- http
- https
- mailto
- ftp
- RTSP, rtspu
π 2.2 Hosts and Ports
To obtain resources from the Internet, we need to know which machine it is and where on that machine we can find the server that can access the target resource. The host and port of the URL provide this information. We can access it by hostname or IP address, but in the front end with framework development, we are actually accessing the local IP address due to the locally started service.
π 2.3 path
The path component of the URL indicates where the resource is located on the server. A path is usually much like a hierarchical file system path. π° such as:
https://www.bookstack.cn/read/html-tutorial/docs-url.md
π 2.4 parameters
For many schemes, a simple hostname and the path to the object is not enough. Many protocols require more information to work than just what port the server is listening on and whether you can access resources by username and password.
To provide applications with the input parameters they need to properly interact with the server, there is a parameter component in the URL. This component is the list of name-value pairs in the URL, consisting of the character “; “. Separate it from the rest of the URL (and the name-value pairs). They provide the application with all the additional information it needs to access the resource.
π 2.5 Searching for a character string
Find strings, which we use a lot in our daily development. For example, when we jump from the item list to the item details, the item ID is passed to the item details page, and the item ID is then used to retrieve the item details. π° for example:
https://search.bilibili.com/all?keyword= & harry potter from_source = webtop_search & spm_id_from = 333.851
π€Έ If we search for Harry Potter on Bilibili, press Enter and we will be taken to the URL above, which is for the most part similar to other urls we have seen. Only question marks (?) The content on the right is new. This part is called the Query component. The URL query component is sent to the gateway resource along with the URL path component that identifies the gateway resource. In the last link, we got three parameters, keyword, from_source, and SPM_ID_FROM. As a general rule, many gateways expect the query string to appear as a series of name/value pairs separated by the character am&
π 2.6 fragment
To refer to part of a resource or a fragment of a resource, urls support the fragment (FRAG) component to represent fragments within a resource. In daily development, we also call it anchor point, which is the anchor point inside the web page. It uses # plus the name of anchor point and puts it at the end of the web address, such as #anchor. When the browser loads the page, it scrolls to the location of the anchor point.
HTTP servers typically only process whole objects, not fragments of objects, and clients cannot pass fragments to the server. Once the browser has retrieved the entire resource from the server, it will display the fragment that you are interested in.
π 3. Url encoding
π 3.1 When URL encoding
In order to circumvent the limitations of secure character set representation, an encoding mechanism was designed to represent various insecure characters in urls. This encoding mechanism represents unsafe characters by an “escape” notation that contains a percent sign (%) followed by two hexadecimal numbers representing the ASCII characters.
π 3.2 Why are URLS encoded
Usually if something needs to be encoded, it’s not suitable for transmission. There are various reasons. For example, the Size is too large and contains private data. For THE Url, it needs to be encoded for the following two reasons:
-
The HTTP protocol states that the request header and the request line must be ASCII, which means that you cannot include any non-ASCII characters, such as Chinese, in the Url. This part of the encoding and decoding task encodeURI, decodeURL can be completed, and in fact, browsers and Web servers generally help us to do.
-
Some contents of the URL contain unsafe characters. For example, the URL parameter string is passed with key=value pairs separated by ampersands, such as /s? Q = abc&ie = utf-8. If your value string contains = or &, the server that received the Url will parse incorrectly, so the & and = symbols that are ambiguous must be escaped, that is, encoded. This part can be completed with encodeURIComponent, decoding background server to complete automatically
However, the server does not encode the URL, so the server cannot process the complex result of encoding, so you need to use Javascript to encode the URL first, and then submit it to the server, so that the browser does not have a chance to intervene.
π 3.3 js encodes and decodes the URL
The following three functions can encode URLS. The difference lies in the characters that are not encoded. For details, see their respective application scenarios:
π 1. The escape and unescape
Encodes all characters except ASCII letters, digits, and punctuation marks @ * _ + -. /.Copy the code
π 2. EncodeURI and decodeURI
Returns a string encoded as a valid uniform resource Identifier (URI), unencoded:! # $& * () = : /; ? + 'encodeURI() is the function used in Javascript to actually encode urls.Copy the code
π 3. encodeURIComponent and decodeURIComponent
Encode parts of a URL individually, not the entire URLCopy the code