This article is from the previous interview frequently required handwritten URL parse, so for this situation, I write a special article to briefly describe how to parse URL, if you have a better parsing method or variation of question type welcome to discuss

Note that this article just discuss one of the formats listed in the beginning, has not yet been discussed more URL format, more conform to the standard format (such as using the relative path) see: tools.ietf.org/html/rfc398…

What the URL looks like

First let’s look at what a full URL looks like :< scheme>://

: @

: / ; ?

#

If this is too abstract, let’s make it concrete with an example: https://juanni:[email protected]:8080/file; foo=1; bar=2? test=3&miao=4#test

component describe The default value
scheme Protocol used to access the server to obtain resources There is no https
user User name used to access resources None (anonymous) juanni
password The password of the user, and the username used:segmentation E-mail miao
host Resource server host name or IP address There is no www.foo.com
port The port on which the resource server listens. Different schemes have different default ports (HTTP uses 80 as the default port) And scheme about 8080
path Resource path on the server. The path is server – and Scheme-dependent The default value /file
params In some schemes, input parameters are specified as key-value pairs. Can have more than one, use;Split, use of multiple values within a single.segmentation The default value foo=1; bar=2
query There is no common format for this component, which is used mostly in HTTP&To separate multiple Queries. use?Separate query from other parts There is no test=3&miao=4
frag/fragment Name of a small piece or part of a resource. When referring to an object, the fragment is not passed to the server for internal use by the client. through#Separate the fragment from the rest There is no test

Since the path parameter is part of path, we group it under PATH

[scheme:]//[user[:password]@]host[:port][/path][? Query][#fragment]

How do I get each component

Let’s get each component first, regardless of the data inside the component

Let the browser parse it for us – URLUtils

Here’s a lazy way to get href, hostname, port, and so on: URLUtils.

In the browser environment, our A tag, HTMLAnchorElement, implements the properties defined in URLUtils, so we can get each component with the following code

/** * @param {string} url * @returns {protocol, username, password, hostname, port, pathname, search, hash} */
function URLParser(url) {
    const a = document.createElement('a');
    a.href = url;
    return {
        protocol: a.protocol,
        username: a.username,
        password: a.password,
        hostname: a.hostname, // host may include port, hostname does not
        port: a.port,
        pathname: a.pathname,
        search: a.search,
        hash: a.hash,
    }
}
Copy the code

Disadvantages:

  • Depend on the browser host environment interface

useURLobject

The above method of using the A tag is invalid in the Node environment, but there is another way for the underlying API to parse it for us — the URL

/** * @param {string} url * @returns {protocol, username, password, hostname, port, pathname, search, hash} */
function URLParser(url) {
    const urlObj = new URL(url);
    return {
        protocol: urlObj.protocol,
        username: urlObj.username,
        password: urlObj.password,
        hostname: urlObj.hostname,
        port: urlObj.port,
        pathname: urlObj.pathname,
        search: urlObj.search,
        hash: urlObj.hash,
    }
}
Copy the code

Just give it a hand

If the interviewer is going to wank, it’s to him:

function parseUrl(url) {
    var pattern = RegExp("^ (? : [[# ^ /?] +))? / / (? (*) [^ :] (? : :? (. *)) @)? (? : [[# ^ /? :] *) :? ([0-9] +)? ? ([^? #] *) (\ \? (? : [^ #] *))? (# (? : *))?");
    var matches =  url.match(pattern) || [];
    return {
        protocol: matches[1].username: matches[2].password: matches[3].hostname: matches[4].port:     matches[5].pathname: matches[6].search:   matches[7].hash:     matches[8]}; } parseUrl("https://juanni:[email protected]:8080/file; foo=1; bar=2? test=3&miao=4#test")
// hash: "#test"
// hostname: "www.foo.com"
// password: "miao"
// pathname: "/file; foo=1; bar=2"
// port: "8080"
// protocol: "https:"
// search: "? test=3&miao=4"
// username: "juanni"
Copy the code

This is a bit hard to understand, but with some foundation you can understand it with the following two graphs:

Parse the search(Query) part

Lazy to useURLSearchParams

/** * @param {string} search is similar to location.search * @returns {object} */
function getUrlQueyr(search) {
    const searchObj = {};
    for (let [key, value] of new URLSearchParams(search)) {
        searchObj[key] = value;
    }
    return searchObj;
}
Copy the code

Advantages:

  • No manual use is requireddecodeURIComponent
  • Will help automatically convert + on query to a space (used alone)decodeURIComponentCan’t do that) (In what case does a space get converted to+When is space converted to% 20, you canReference here, etc.)
  • Does not support such asarray[] / obj{}The form such as

Another hand lift (incomplete version)

Requirements:

  • Invalid characters are not parsed
  • Parse to arrays of form list[]
  • For objects of the form obj{} (just for nowJSON.parseParse)
/** * @param {string} query {location.search * @returns {object} */
function parseQueryString(query) {
    if(! query) {return {};
    }
    query = query.replace(/ ^ \? /.' ');
    const queryArr = query.split('&');
    const result = {};
    queryArr.forEach(query= > {
        let [key, value] = query.split('=');
        try {
            value = decodeURIComponent(value || ' ').replace(/\+/g.' ');
            key = decodeURIComponent(key || ' ').replace(/\+/g.' ');
        } catch (e) {
            / / illegal
            console.log(e);
            return;
        }
        const type = getQuertType(key);
        switch(type) {
            case 'ARRAY':
                key = key.replace($/ / \ [\].' ')
                if(! result[key]) { result[key] = [value]; }else {
                    result[key].push(value);
                }
                break;
            case 'JSON': 
                key = key.replace(/ \ {\} $/.' ')
                value = JSON.parse(value);
                result.json = value;
                break;
            default: result[key] = value; }});return result;
    function getQuertType (key) {
        if (key.endsWith('[]')) return 'ARRAY';
        if (key.endsWith('{}')) return 'JSON';
        return 'DEFAULT'; }}const testUrl = 
'? name=coder&age=20&callback=https%3A%2F%2Fmiaolegemi.com%3Fname%3Dtest&list[]=a&list[]=b&json{}=%7B%22str%22%3A%22abc%22, %22num%22%3A123%7D&illegal=C%9E5%H__a100373__b4'
parseQueryString(testUrl)
Copy the code

Of course, here is not rigorous, did not consider the following problems

  1. How to handle the same field
  2. There is no replacement+
  3. onlykey
  4. onlyvalue
  5. The relative path is not resolved
  6. More in-depth analysisObject

Finally, I recommend an open source library: URL-parse, which can handle all kinds of situations well. At the same time, it also means that the implementation is a little complicated, which can be understood. In the interview, it is necessary to fully understand the interviewer’s requirements to answer and expand

reference

  • This time, let’s dig a little deeper – URL do you really understand?
  • path-parameter-syntax
  • URLUtils
  • URL
  • URLSearchParams