The problem background
After receiving the parameter from the client, the base64 decoding failed. After investigation, it was found that the reason was that there was + in the string before the parameter was uploaded. However, after RECEIVING the parameter, PHP found that + changed into a space, resulting in the base64 decoding failure.
The validation test
Access a test interface /internal/test
curl 'http://127.0.0.1/internal/test? a=abc+def'
Copy the code
Validation 1: Simple output $_GET
public function test() {
var_dump($_GET);
}
Copy the code
Results:
array(1) {["a"] = >string(7) "abc def"
}
Copy the code
Conclusion: You can see that the + becomes a space when you receive the GET argument directly
Why + becomes space
After a search, first of all we need to know what is URL encoding
URL encoding
A case in point
A common URL, such as a URL for CSDN search (so.csdn.net/so/search/s… So.csdn.net/so/search/s…
This is where the URL is encoded by converting Chinese to two hexadecimal numbers starting with %.
Why is the URL encoded?
The parameter part of the URL is composed of pairs of key=value parameters, whereas if &=/? When special characters with certain functions in URL appear in key or value, semantic inconsistency will occur. For example, the value of parameter Q is A&B. When a parameter pair q=a&b&f= S appears, does it mean that the value of q is A&B or that the value of Q is A and the value of B is empty?
Therefore, the URL should be encoded so that the encoded characters are no longer ambiguous. Q =a&b&f=s in the above example will be encoded as q=a%26b&f=s.
How do you encode urls?
How urls are encoded is dictated by the RFC standard,
- In rfC-1738, it is proposed to encode unsafe characters in URL by using % and two hexadecimal digits. Note that Spaces are encoded as + in this standard
- Encoding of parameters is mentioned again in the urIS specification in the updated VERSION RFC-2396, and note that Spaces are encoded as %20 in this standard
- In the updated VERSION of RFC-3986, more detailed recommendations are made on Url codec, indicating which characters need to be encoded so as not to cause semantic changes in Url, and explaining why these characters need to be encoded.
Let’s go back to the problem we started with
From the above data, we can see that the reason why + is changed to space is exactly according to the rfC-1738 standard for inverse coding, namely. PHP accepts the $_GET argument according to the RFC-1738 standard. So when you read $_GET directly, + is decoded as a space instead
How to solve this problem
So how do we get PHP to decode according to rfC-3986 instead of RFC-1738?
The easiest thing to do, of course, is to have + encoded the right way, which is to encode the URL according to rfC-3986 when the client requests the interface. At this point + is encoded as %2b, and when PHP receives the argument, it decodes %2b to +, and you’re done.
The verification results
Encode the URL correctly
curl 'http://127.0.0.1/internal/test? a=abc%2bdef'
Copy the code
You can see the interface output
array(1) {["a"] = >string(7) "abc+def"
}
Copy the code
Are there any other holes in the PHP language?
In addition to accepting $_GET arguments, there are two common functions in PHP that handle URL arguments, urlencode and urldecode. Note that these two functions are also encoded and decoded according to RFC-1738, as can be seen from the instructions on the official website
This pegasus int ‘l from the » RFC 3986 Encoding (see Rawurlencode ()) in that for historical reasons, spaces are encoded as plus (+) signs.
Do a test
The string ABC def is encoded first
$str = 'abc def';
echo urlencode($str);
Copy the code
The output
abc+def
Copy the code
The string a= ABC +def is then decoded
$str = 'a=abc+def';
echo urldecode($str);
Copy the code
The output
a=abc def
Copy the code
You can see that Spaces are indeed encoded as +, and + is decoded as space
How do you solve it?
Rawurlencode and Rawurldecode are available in PHP using rfC-3986
Rawurlencode — urL-encode according to RFC 3986
Let’s do another experiment where we encode the string ABC def
$str = 'abc def';
echo rawurlencode($str);
Copy the code
The output
abc%20def
Copy the code
You can see that the space is encoded as %20, and then the string A = ABC +def is decoded
$str = 'a=abc+def';
echo rawurldecode($str);
Copy the code
The output
a=abc+def
Copy the code
You can see that the + decoded is still +, not a space
conclusion
Therefore, the most standard and easy to implement solution is to make the client or front end follow THE RFC-3986 standard for correct URL encoding when requesting the server interface