Gopher refers to north
It is a common understanding that urls cannot explicitly contain whitespace, and the form in which whitespace exists is not entirely consistent across standards, so different languages have different implementations.
Rfc2396 explicitly states that Spaces should be encoded as %20.
The W3C standard states that Spaces can be replaced with + or %20.
Xu was so confused that the space was replaced with a +, which itself could only be encoded. In that case, why not just code Spaces directly. Of course, this is just a doubt in Old Xu’s mind, the previous background we can not trace, has become a fact we can not change. However, whether Spaces should be replaced with + or 20%, and whether + needs to be encoded are all questions we need to face.
There are three URL encoding methods commonly used by Go
As Gopher, the first concern is naturally the implementation of Go language itself, so we first understand the similarities and differences of three URL encoding methods commonly used in Go.
url.QueryEscape
fmt.Println(url.QueryEscape("+Gopher refers to north"))
// Output: +%2BGopher% e6% 8C%87% e5% 8C%97
Copy the code
When encoded with url.queryEscape, Spaces are encoded as +, and + itself is encoded as %2B.
url.PathEscape
fmt.Println(url.PathEscape("+Gopher refers to north"))
// Output: %20+Gopher%E6%8C%87%E5%8C%97
Copy the code
When encoded with url.pathescape, Spaces are encoded as 20% and + is not encoded.
url.Values
var query = url.Values{}
query.Set("hygz"."+Gopher refers to north")
fmt.Println(query.Encode())
Hygz =+%2BGopher%E6%8C%87%E5%8C%97
Copy the code
Encode with the (Values).encode method, the space is encoded as +, and the + itself is encoded as %2B. The difference between (Values).encode and url.queryescape is that the (Values).encode only encodes the key and value in query, while the (Values).queryescape encodes both = and &.
For us developers, please read on to find out which of these three coding methods should be used.
Implementations in different languages
Since space and + have different implementations of URL encoding in Go, does this also exist in other languages? PHP and JS are taken as examples.
URL encoding in PHP
urlencode
echo urlencode('+Gopher refers to north');
// Output: +%2BGopher% e6% 8C%87% e5% 8C%97
Copy the code
rawurlencode
echo rawurlencode("+Gopher refers to north");
// Output: %20%2BGopher%E6%8C%87%E5%8C%97
Copy the code
PHP’s urlencode is the same as Go’s url.QueryEscape function, while Rawurlencode encodes both Spaces and +.
URL encoding in JS
encodeURI
encodeURI('+Gopher refers to north')
// Output: %20+Gopher%E6%8C%87%E5%8C%97
Copy the code
encodeURIComponent
encodeURIComponent('+Gopher refers to north')
// Output: %20%2BGopher%E6%8C%87%E5%8C%97
Copy the code
JS encodeURI is the same as Go url.pathescape, while encodeURIComponent encodes both Spaces and +.
What should we do
The url.PathEscape function is preferred
The previous paper has summarized the coding operation of Go, PHP and JS to +Gopher, and the following is a summary of the two-dimensional table of whether the corresponding decoding operation is feasible.
Encoding/decoding | url.QueryUnescape | url.PathUnescape | urldecode | rawurldecode | decodeURI | decodeURIComponent |
---|---|---|---|---|---|---|
url.QueryEscape | Y | N | Y | N | N | N |
url.PathEscape | N | Y | N | YY | Y | YY |
urlencode | Y | N | Y | N | N | N |
rawurlencode | Y | YY | Y | Y | N | Y |
encodeURI | N | Y | N | Y | Y | Y |
encodeURIComponent | Y | YY | Y | Y | N | Y |
YY and Y in the above table have the same meaning, Lao Xu only refers to YY. Url. PathEscape is recommended for encoding in Go, and Rawurldecode and decodeURIComponent are recommended for decoding in PHP and JS respectively.
In the actual development process, there must be some scenes that need to be decoded in Gopher. At this time, it is necessary to communicate with the URL encoder to get a proper way of decoding.
Encode values
Is there a universal way to do this that doesn’t require URL codec? No doubt there is! Take base32 encoding as an example, its encoding character set is A-Z and numbers 2-7. In this case, url encoding is not required after base32 encoding of values.
Finally, I sincerely hope that this article can be of some help to all readers.
This article uses Console with PHP 7.3.29, Go 1.16.6, and JS Chrome94.0.4606.71 respectively
reference
- www.rfc-editor.org/rfc/rfc2396…
- www.w3schools.com/tags/ref_ur…