This article is from the previous interview frequently required handwritten URL parse, so for this situation, I write a special article to briefly describe how to parse URL, if you have a better parsing method or variation of question type welcome to discuss
Note that this article just discuss one of the formats listed in the beginning, has not yet been discussed more URL format, more conform to the standard format (such as using the relative path) see: tools.ietf.org/html/rfc398…
What the URL looks like
First let’s look at what a full URL looks like :< scheme>://
:
:
#
If this is too abstract, let’s make it concrete with an example: https://juanni:[email protected]:8080/file; foo=1; bar=2? test=3&miao=4#test
component | describe | The default value | |
---|---|---|---|
scheme | Protocol used to access the server to obtain resources | There is no | https |
user | User name used to access resources | None (anonymous) | juanni |
password | The password of the user, and the username used: segmentation |
miao | |
host | Resource server host name or IP address | There is no | www.foo.com |
port | The port on which the resource server listens. Different schemes have different default ports (HTTP uses 80 as the default port) | And scheme about | 8080 |
path | Resource path on the server. The path is server – and Scheme-dependent | The default value | /file |
params | In some schemes, input parameters are specified as key-value pairs. Can have more than one, use; Split, use of multiple values within a single. segmentation |
The default value | foo=1; bar=2 |
query | There is no common format for this component, which is used mostly in HTTP& To separate multiple Queries. use? Separate query from other parts |
There is no | test=3&miao=4 |
frag/fragment | Name of a small piece or part of a resource. When referring to an object, the fragment is not passed to the server for internal use by the client. through# Separate the fragment from the rest |
There is no | test |
Since the path parameter is part of path, we group it under PATH
[scheme:]//[user[:password]@]host[:port][/path][? Query][#fragment]
How do I get each component
Let’s get each component first, regardless of the data inside the component
Let the browser parse it for us – URLUtils
Here’s a lazy way to get href, hostname, port, and so on: URLUtils.
In the browser environment, our A tag, HTMLAnchorElement, implements the properties defined in URLUtils, so we can get each component with the following code
/** * @param {string} url * @returns {protocol, username, password, hostname, port, pathname, search, hash} */
function URLParser(url) {
const a = document.createElement('a');
a.href = url;
return {
protocol: a.protocol,
username: a.username,
password: a.password,
hostname: a.hostname, // host may include port, hostname does not
port: a.port,
pathname: a.pathname,
search: a.search,
hash: a.hash,
}
}
Copy the code
Disadvantages:
- Depend on the browser host environment interface
useURL
object
The above method of using the A tag is invalid in the Node environment, but there is another way for the underlying API to parse it for us — the URL
/** * @param {string} url * @returns {protocol, username, password, hostname, port, pathname, search, hash} */
function URLParser(url) {
const urlObj = new URL(url);
return {
protocol: urlObj.protocol,
username: urlObj.username,
password: urlObj.password,
hostname: urlObj.hostname,
port: urlObj.port,
pathname: urlObj.pathname,
search: urlObj.search,
hash: urlObj.hash,
}
}
Copy the code
Just give it a hand
If the interviewer is going to wank, it’s to him:
function parseUrl(url) {
var pattern = RegExp("^ (? : [[# ^ /?] +))? / / (? (*) [^ :] (? : :? (. *)) @)? (? : [[# ^ /? :] *) :? ([0-9] +)? ? ([^? #] *) (\ \? (? : [^ #] *))? (# (? : *))?");
var matches = url.match(pattern) || [];
return {
protocol: matches[1].username: matches[2].password: matches[3].hostname: matches[4].port: matches[5].pathname: matches[6].search: matches[7].hash: matches[8]}; } parseUrl("https://juanni:[email protected]:8080/file; foo=1; bar=2? test=3&miao=4#test")
// hash: "#test"
// hostname: "www.foo.com"
// password: "miao"
// pathname: "/file; foo=1; bar=2"
// port: "8080"
// protocol: "https:"
// search: "? test=3&miao=4"
// username: "juanni"
Copy the code
This is a bit hard to understand, but with some foundation you can understand it with the following two graphs:
Parse the search(Query) part
Lazy to useURLSearchParams
/** * @param {string} search is similar to location.search * @returns {object} */
function getUrlQueyr(search) {
const searchObj = {};
for (let [key, value] of new URLSearchParams(search)) {
searchObj[key] = value;
}
return searchObj;
}
Copy the code
Advantages:
- No manual use is required
decodeURIComponent
- Will help automatically convert + on query to a space (used alone)
decodeURIComponent
Can’t do that) (In what case does a space get converted to+
When is space converted to% 20
, you canReference here, etc.) - Does not support such as
array[]
/obj{}
The form such as
Another hand lift (incomplete version)
Requirements:
- Invalid characters are not parsed
- Parse to arrays of form list[]
- For objects of the form obj{} (just for now
JSON.parse
Parse)
/** * @param {string} query {location.search * @returns {object} */
function parseQueryString(query) {
if(! query) {return {};
}
query = query.replace(/ ^ \? /.' ');
const queryArr = query.split('&');
const result = {};
queryArr.forEach(query= > {
let [key, value] = query.split('=');
try {
value = decodeURIComponent(value || ' ').replace(/\+/g.' ');
key = decodeURIComponent(key || ' ').replace(/\+/g.' ');
} catch (e) {
/ / illegal
console.log(e);
return;
}
const type = getQuertType(key);
switch(type) {
case 'ARRAY':
key = key.replace($/ / \ [\].' ')
if(! result[key]) { result[key] = [value]; }else {
result[key].push(value);
}
break;
case 'JSON':
key = key.replace(/ \ {\} $/.' ')
value = JSON.parse(value);
result.json = value;
break;
default: result[key] = value; }});return result;
function getQuertType (key) {
if (key.endsWith('[]')) return 'ARRAY';
if (key.endsWith('{}')) return 'JSON';
return 'DEFAULT'; }}const testUrl =
'? name=coder&age=20&callback=https%3A%2F%2Fmiaolegemi.com%3Fname%3Dtest&list[]=a&list[]=b&json{}=%7B%22str%22%3A%22abc%22, %22num%22%3A123%7D&illegal=C%9E5%H__a100373__b4'
parseQueryString(testUrl)
Copy the code
Of course, here is not rigorous, did not consider the following problems
- How to handle the same field
- There is no replacement
+
为 - only
key
- only
value
- The relative path is not resolved
- More in-depth analysis
Object
Finally, I recommend an open source library: URL-parse, which can handle all kinds of situations well. At the same time, it also means that the implementation is a little complicated, which can be understood. In the interview, it is necessary to fully understand the interviewer’s requirements to answer and expand
reference
- This time, let’s dig a little deeper – URL do you really understand?
- path-parameter-syntax
- URLUtils
- URL
- URLSearchParams