Gopher refers to north

It is a common understanding that urls cannot explicitly contain whitespace, and the form in which whitespace exists is not entirely consistent across standards, so different languages have different implementations.

Rfc2396 explicitly states that Spaces should be encoded as %20.

The W3C standard states that Spaces can be replaced with + or %20.

Xu was so confused that the space was replaced with a +, which itself could only be encoded. In that case, why not just code Spaces directly. Of course, this is just a doubt in Old Xu’s mind, the previous background we can not trace, has become a fact we can not change. However, whether Spaces should be replaced with + or 20%, and whether + needs to be encoded are all questions we need to face.

There are three URL encoding methods commonly used by Go

As Gopher, the first concern is naturally the implementation of Go language itself, so we first understand the similarities and differences of three URL encoding methods commonly used in Go.

url.QueryEscape

fmt.Println(url.QueryEscape("+Gopher refers to north"))
// Output: +%2BGopher% e6% 8C%87% e5% 8C%97
Copy the code

When encoded with url.queryEscape, Spaces are encoded as +, and + itself is encoded as %2B.

url.PathEscape

fmt.Println(url.PathEscape("+Gopher refers to north"))
// Output: %20+Gopher%E6%8C%87%E5%8C%97
Copy the code

When encoded with url.pathescape, Spaces are encoded as 20% and + is not encoded.

url.Values

var query = url.Values{}
query.Set("hygz"."+Gopher refers to north")
fmt.Println(query.Encode())
Hygz =+%2BGopher%E6%8C%87%E5%8C%97
Copy the code

Encode with the (Values).encode method, the space is encoded as +, and the + itself is encoded as %2B. The difference between (Values).encode and url.queryescape is that the (Values).encode only encodes the key and value in query, while the (Values).queryescape encodes both = and &.

For us developers, please read on to find out which of these three coding methods should be used.

Implementations in different languages

Since space and + have different implementations of URL encoding in Go, does this also exist in other languages? PHP and JS are taken as examples.

URL encoding in PHP

urlencode

echo urlencode('+Gopher refers to north');
// Output: +%2BGopher% e6% 8C%87% e5% 8C%97
Copy the code

rawurlencode

echo rawurlencode("+Gopher refers to north");
// Output: %20%2BGopher%E6%8C%87%E5%8C%97
Copy the code

PHP’s urlencode is the same as Go’s url.QueryEscape function, while Rawurlencode encodes both Spaces and +.

URL encoding in JS

encodeURI

encodeURI('+Gopher refers to north')
// Output: %20+Gopher%E6%8C%87%E5%8C%97
Copy the code

encodeURIComponent

encodeURIComponent('+Gopher refers to north')
// Output: %20%2BGopher%E6%8C%87%E5%8C%97
Copy the code

JS encodeURI is the same as Go url.pathescape, while encodeURIComponent encodes both Spaces and +.

What should we do

The url.PathEscape function is preferred

The previous paper has summarized the coding operation of Go, PHP and JS to +Gopher, and the following is a summary of the two-dimensional table of whether the corresponding decoding operation is feasible.

Encoding/decoding url.QueryUnescape url.PathUnescape urldecode rawurldecode decodeURI decodeURIComponent
url.QueryEscape Y N Y N N N
url.PathEscape N Y N YY Y YY
urlencode Y N Y N N N
rawurlencode Y YY Y Y N Y
encodeURI N Y N Y Y Y
encodeURIComponent Y YY Y Y N Y

YY and Y in the above table have the same meaning, Lao Xu only refers to YY. Url. PathEscape is recommended for encoding in Go, and Rawurldecode and decodeURIComponent are recommended for decoding in PHP and JS respectively.

In the actual development process, there must be some scenes that need to be decoded in Gopher. At this time, it is necessary to communicate with the URL encoder to get a proper way of decoding.

Encode values

Is there a universal way to do this that doesn’t require URL codec? No doubt there is! Take base32 encoding as an example, its encoding character set is A-Z and numbers 2-7. In this case, url encoding is not required after base32 encoding of values.

Finally, I sincerely hope that this article can be of some help to all readers.

This article uses Console with PHP 7.3.29, Go 1.16.6, and JS Chrome94.0.4606.71 respectively

reference

  • www.rfc-editor.org/rfc/rfc2396…
  • www.w3schools.com/tags/ref_ur…