I’m a code detective, and my job is to detect bugs, and occasionally create bugs myself.
On the day of YYYY, I received a commission to solve a supernatural event that occurred in the form input box.
I. Case reduction
The thing is, the company has a back office management system that is mainly used to configure marketing campaigns through forms. There is a configuration item in the form, which is the jump link of a button on the activity page. The operation students paste the URL into the input box, and a TWO-DIMENSIONAL code can be generated in real time on the form page, which is convenient for scanning code verification.
(URL is the gold digging article link as an example, same as below)
But this configuration activity, operation students can not scan the TWO-DIMENSIONAL code. But that’s not the point.
The operation students found the product students feedback problems, and the product students found the test students to triage the problems. When the test student tried to reproduce the question, something even weirder happened.
Test students copy the URL in the input box, paste it into the flying book sent out, and then copy the URL in the message sent out, paste it into the input box, the two-dimensional code generated can be opened!
When I learned about the case, my first thought was to determine the scope. Spot check several previous activities of the configuration form, the link of the TWO-DIMENSIONAL code are normal. I was so relieved that I didn’t make any new mistakes — after all, I was the one who had just changed the code for this input box. I happened to add the trim modifier to the V-Model of this input box yesterday.
Other QR codes are normal, plus the URL is accessible after being sent by flying books. Combining these two clues, it can be determined that the input box < INPUT V-model. Trim /> and the QR code generation component are not wrong. The scope of supernatural events can be delineated in the configuration of this activity, and it can be concluded that the root of the problem comes from the URL pasted in by the operation students.
But after flying book processing links, and operation students paste in the link, is exactly the same, at least from the naked eye, there is no difference between the two. Can flying books also have the effect of exorcism?
Second, pull the silk from the cocoon
Zhihu says, ask yes first, then ask why. The first thing you need to confirm is that the two urls are really the same. In our line of work, we all know we can’t trust our eyes, and I’m highly myopic.
I copied the illuminated URL of the Flybook, opened Chrome Developer Tools in the Paranormal input box, and got the bewitched URL. (Here’s a tip: If you select Elements in the Elements panel, you can get the DOM object directly with $0 on the Console, and access $0. Value to get the contents of the input field.)
For the sake of description, I’ll call the URL that works the “true link” and the other, naturally, the “fake link.” Bring both parties to the court and confront them, without even asking:
As you might expect, the two are not the same.
The two strings are not equal because of the number of characters, the order of the characters, or both. In terms of visual representation, the characters are arranged in the same order. And in terms of length, even if you take the average of many measurements with a vernier caliper, they are exactly the same. But if I did use a ruler to calculate the length of strings, I’m afraid I’d be kicked out of the industry. You have to code things out in code circles. By comparing the length attribute, I found that the fake link is one more character than the real link.
So homework: How do you find the difference between two strings? I’ll leave this algorithm to those of you who can, but if you have a better solution, don’t go to the comments section after school.
In the interest of quick screening, I ignored performance and elegance and simply ran through comparisons from start to finish. Fortunately, it’s different from the first character; Unfortunately, this first character, I can’t see it.
This is so embarrassing. It has no width, so it’s definitely not a space; It affects the length of the string, so it’s definitely not null. It, invisible…
My friends, anyone who can be invisible is a tough guy to mess with. A series of images flashed through my mind: Harry Potter in the Deathly Hallows, the Invisible woman in the Fantastic Four, Gollum obsessed with the One Ring, my QQ friend who became invisible online… Ahem, I digress. Anyway, I always have a little trepidation about invisible existence.
To be on the safe side, I’ll go back and verify that the two urls differ only in the first character:
Call…… Fortunately, fortunately, such a scourge has enough.
How did such a transparent character end up here? I paid a return visit to my client operations classmate, and she said that the GENERATION of this URL went through the following steps:
- Someone else sent her the URL by flying book.
- She copied and pasted the URL into the message box of the flying book and manually changed a few characters in the link.
- Copy the URL and paste it into the crime scene input box.
I opened the flying book to imitate the gourd, and soon groped for this trick: in the flying book chat input box after a new line, the beginning of the second line of text will be more than a character, it is exactly I just pulled out of the supernatural character. However, after sending multiple lines of text, Feibook will replace the supernatural characters with carriage returns, so the sent URL is normal.
So now the story is clear: the invisible character gets in front of the network protocol in the URL, and the QR code generator dutifully converts the string into a QR code regardless of whether it’s a valid URL or not; Scan code tool to parse the TWO-DIMENSIONAL code, get a string after a look – this is not a URL huh, there is a ghost to terminate the transaction, so can not access to the URL corresponding page.
Three, get to the bottom of it
Now I have caught the culprit, but I can’t see his true face. In Chinese mythology, this is called touching something unclean – where can I get cow tears or hornhorn?
Don’t panic, we know that what characters can be displayed and used in a computer system depends on what character set the system supports. As long as it is not a black account, the account information of each character must be recorded. Since this invisible character can make waves in the browser, its id number — Code Point — must be recorded in the character set.
I used codePointAt() to get the code point. Of course, the code point is decimal, which is for humans. To check the account, I have to convert toString(16) to hexadecimal, which is 2063.
With the id number, I went to the official website of Unicode, which is in charge of the household registration of characters. I entered 2063 and found the household registration information of invisible characters.
Officially, it’s an invisible comma that joins mathematical symbols to form a list.
In the past, we’ve relied on V-model.trim to trim extra characters from the input box, but as it stands, 2063 has broken through the trim modifier and is stubbornly stuck at the front of urls. So trim is not a panacea. It’s so insecure. I need to know what I can stop and what I can’t.
The most accurate answer to the internal logic of the trim modifier is undoubtedly in the Vue source code. I found the trim modifier section in the Vue source code. As you can see, Vue uses trim() directly from JavaScript, without too much manipulation or repackaging of the trim method. This is probably done to ensure semantic consistency with the native trim method.
In this case, the natural next step is to see how JavaScript defines trim().
I found the logical definition of the trim method in the official ECMAScript documentation:
Pull up the radish and bring out the mud. This TrimString looks like a raw face, right? It turns out that the official documentation defines some internal methods to make it easier to explain the syntactic logic. They exist only within the scope of the document, and languages such as JavaScript do not need to implement these internal methods when implementing the ECMAScript standard. Let’s take a look at the TrimString definition:
Did you feel sleepy right away? The TrimString accepts two parameters: the string to be processed and the position to be processed. The position can be at the beginning or end of the string. The default position is both. The JavaScript engine will remove the “White space” in the corresponding position according to the argument and return the processed string.
As far as we can remember, trim() has never been passed as an argument. Why does ECMAScript officially set positional parameters in the internal logical TrimString? TrimStart () and trimEnd(), both of which pass position parameters to TrimString.
As I said, White space will be removed from the internal logic, which is the point of my documentation – what is a happy planet, ugh, what is a White space?
Below the TrimString definition is the following:
“White space” referred to in the document includes WhiteSpace and LineTerminator:
Sure enough, the code 2063 does not fall within the three-pack trim() range.
Four, cut the demon in addition to the devil
So how do you get rid of this invisible character? The problem is not complicated, but it is not a silver bullet. To put an elephant in a fridge, I have to break down the problem.
First, focus on the URL, the root of the paranormal event. To remove \u2063 at the beginning of the URL, check and replace it with a regular expression.
Then enclose the input box in the field of view. I can use form validation to verify field values for what the user enters in , and if I detect any rogue elements at the beginning of the URL, I’ll flash a red card on the interface. Take iView form verification as an example:
validateRules: {
url: [{
validator: (rule, value, callback) = > {
let reg = /^https*:\/\//
if (reg.test(value1)) {
callback(new Error('Links should start with HTTP or HTTPS'))
}
callback()
},
trigger: 'change'}}]Copy the code
But that’s just kicking the ball to the user. When the operation students paste in the supernatural URL, the form verification can indeed prompt the input content is not a valid URL, but it can not tell where illegal. Obviously, the front-end code should be directly disposed of \ U2063, so that users can directly get the normal scanning code to open the TWO-DIMENSIONAL code, this is a better solution.
The variable values bound to the V-Model are processed (computed or watch, that’s a question, and that’s an interview question), and the URL bound to the QR code generation component, which is the interface parameter when submitting the form, is processed.
<template>
<input type="url" v-model.trim="link">
<vue-qrcode v-if="computedLink" :value="computedLink">
<vue-qrcode v-if="watchedLink" :value="watchedLink"></vue-qrcode>
</template>
<script>
data () {
return {
link: ' '.// The bound link on the input box
watchedLink: ' ' // Use watch-processed links}},computed: {
computedLink: {
get () {
return this.link.replace(/\u2063/g.' ')}}},watch: {
link (val) {
this.watchedLink = val.replace(/\u2063/g.' ')}}</script>
Copy the code
The scheme looks OK and covers the current input field. But I’m looking for more than just solving an input box problem. Because the processing logic is coupled to the business page, it becomes cumbersome to reuse it in other input fields and other pages. Moreover, it is not reasonable to leave problematic URLS in the input box — logically, it will bring hidden trouble to the later iteration and maintenance; I don’t like it.
Put these factors together, and a general direction emerges: I want to limit my influence to the input box, and my goal is to produce legitimate urls regardless of what the user enters. So you need to listen for the input event, process the input value, and bind the processed value to the value property. Ah? I know that word. Another interview question — handwritten V-model. I’m glad I can turn the screws and remember the great days of rocket building.
I encapsulated the input field as a component, kept the processing logic inside, and used v-Model to send and receive the input field. (However, if the historical data has such a spooky URL, it needs extra processing before the form is backfilled.)
Custom components:
<template>
<input @input="handleInput" :value="trimedValue">
</template>
<script>
export default {
props: {
value: {
type: String.default: ' '}},computed: {
trimedValue: {
get () {
return this.deepTrim(this.value)
}
}
},
methods: {
handleInput (e) {
this.$emit('input'.this.deepTrim(e.target.value))
},
deepTrim (str) {
return str.trim().replace(/\u2063/g.' ')}}}</script>
Copy the code
Calling component:
<trim-input v-model="link"></trim-input>
Copy the code
At this point, we can close the case… Yet? Wait, I have no place to put the poor security – according to the roach law, this time catch a \ U2063, there must be an unknown number of similar characters dormant in the dark.
So where do you collect all the invisible characters to summon the dragon? Want to climb Unicode character sets one by one? Yes, but there’s no need.
On the all-powerful GitHub, I found a VS Code plugin made by a developer who gave a family photo of these invisible characters in the documentation. Of all the results I’ve retrieved, this is the most exhaustive. The whole? Not necessarily. Who knows what weird stuff the Unicode Committee will continue to stuff into character sets in the future? I’m not looking for a permanent fix. I need to be alert and mobile.
Thus, the complete processing logic looks like this:
deepTrim (str) {
return str.trim().replace(/\xAD|\uFEFF|\uFEFF|\uFFF9|\uFFFA|\u0001|\u0002|\u0003|\u0004|\u0005|\u0006|\u0007|\u000E|\u000F|\u0010|\u0011|\u0012|\u 0013|\u0014|\u0015|\u0016|\u0017|\u0018|\u0019|\u001A|\u001B|\u001C|\u001D|\u001E|\u007F|\u0080|\u0081|\u0082|\u0083|\u0 086|\u0087|\u0088|\u0089|\u008A|\u008B|\u008C|\u008D|\u008E|\u008F|\u0090|\u0091|\u0092|\u0093|\u0094|\u0095|\u0096|\u00 97|\u0098|\u0099|\u009A|\u009B|\u009C|\u009D|\u009E|\u200B|\u200C|\u200D|\u200E|\u202A|\u202B|\u202C|\u202D|\u2060|\u206 1|\u2062|\u2063|\u206A|\u206B|\u206C|\u206D/g.' ')}Copy the code
V. Case closed
After the code verification went online, I communicated with my colleagues about the case and got many different ideas from them. Some say you can also modify the clipboard content, advance processing URL; Some say that this kind of invisible character can be used in the Hybrid page
At this point, the invisible character supernatural event has been properly solved, the operation students have lived a happy life without worrying about URL from now on, and I will continue to fight with PM and QA…