If you have no idea what to do with zero-width characters, you can play this Demo first

What is a zero-width character?

Zero-width characters are invisible, non-printable characters. They exist on pages primarily to adjust the display format of characters. Here are some common zero-width characters, their Unicode codes and their original uses:

  1. Zero-width spaceU+200B: used for line breaking of longer words
  2. Zero width no-break spaceU+FEFF: used to prevent newline separations at a particular location
  3. Zero-width joinerU+200D: used in Arabic and Hindi languages to produce a hyphenation between characters that would not be hyphenated
  4. Zero-width non-joiner characterU+200C: used in Arabic, German, Hindi, etc., to prevent the hyphen effect between characters that would occur
  5. Left-to-right markU+200E: Used in multilingual texts with mixed text orientations (e.g., mixed left-right English and right-left Hebrew) to specify that typeset text is written left-right
  6. Right-to-left markU+200F: Used in multilingual text with mixed text orientations to specify that typeset text is written right-to-left

What does a zero-width character do?

1. Pass on confidential information

Using the feature that zero-width characters are not visible, we can use zero-width characters to insert invisible text in any web page that is not filtered for zero-width characters. Here is a simple JavaScript example that uses zero-width characters to encrypt and decrypt text:

encryption
// For the sake of brevity and readability, the following code ignores performance considerations

const text = '123 πŸ˜€';

// array. from allows us to correctly read Unicode characters of width 2, such as πŸ˜€
const textArray = Array.from(text);

// Use codePointAt to read decimal Unicode codes for all characters
// Use toString to convert decimal Unicode codes to binary (besides binary, we can also use larger base to shorten the length of the encrypted message to improve efficiency)
const binarify = textArray.map(c= > c.codePointAt(0).toString(2));

// Now the value in binarify is ["110001", "110010", "110011", "11111011000000000"]. Next we need to map the "1", "0" and delimiter to the zero-width character of the response

// We use a zero-width hyphen for 1, a zero-width break for 0, and a zero-width space for delimiters
// The following '' looks like an empty string, but is actually a string of length 1 containing zero-width characters
const encoded = binarify.map(c= > Array.from(c).map(b= > b === '1' ? '‍' : 'β€Œ').join(' ')).join('​');

// Encoded contains an invisible string of encrypted text

Copy the code

Note: When using zero-width characters for encryption, please try not to insert the encrypted invisible text at the beginning or end of the plaintext, in order to avoid missing the invisible text when copying

decryption
// Followed by encoded above
// Extract characters from encrypted text with a delimiter (zero-width space)
const split = encoded.split('​');

// Convert the text back to a binary array
const binary = split.map(c= > Array.from(c).map(z= > z === '‍' ? '1' : '0').join(' '));

// Then the binary returns to the original ["110001", "110010", "110011", "11111011000000000"]

// Just convert the binary text back to decimal and use String.fromCodePoint to get the original text
const decoded = binary.map(b= > String.fromCodePoint(parseInt(b, 2))).join(' ');

// ζ­€ζ—ΆdecodedδΈ­ηš„ε€Όε³ζ˜― "123πŸ˜€"

Copy the code
application
  1. Invisible watermarking

    With zero width characters we can add invisible watermarks to internal files. Face internal documents in visitor login page to browse, we can insert zero width characters used everywhere in the file encrypted visitor information, if your visitors and just use copy and paste the way anonymous Shared the documents on public media, we can through the invisible watermark embedded in the file easily find distributors.

  2. Encrypted information sharing

    With zero-width characters we can share any information on any website. Censorship and filtering of sensitive information play a vital role in today’s Internet community, but zero-width characters can penetrate these two layers of information sharing as easily as no one else. Compared with the plaintext hash table encryption of information, zero-width character encryption can be said to reach a new height of concealment on the Internet. With a simple browser plug-in that recognizes/decrypts zero-width characters, any website can be a playground for information sharing.

2. Escape word matching

// Use zero-width characters to separate sensitive words
const censored = 'Sensitive words';

let censor = censored.replace(/ sensitive words /g.' '); / /"

// Use zero-width Spaces to separate strings
const uncensored  = Array.from(censored).join('​');

censor = uncensored.replace(/ sensitive words /g.' '); // '/'

Copy the code
application
  1. Escape the filter of sensitive words

    We can easily escape sensitive word filtering by using zero-width characters. Automatic filtering of sensitive words is an important tool to maintain the order of Internet community. A large number of illegal words can be excluded by adding sensitive words into the database and matching the corresponding sensitive words. Using homophonic and pinyin to escape the sensitive word filtering will reduce the efficiency of language transmission of information, while using zero-width characters can escape the sensitive word filtering and convey the word meaning to the receiver intact, greatly improving the efficiency of communication between information spreader and receiver.

Examples and summaries

In order to better understand and use zero-width characters, I provide you with a Demo and tool library, library provides some common methods to apply zero-width characters (encryption, decryption, escape matching…). . The presence of zero-width characters on pages can be a good thing or a bad thing, depending on how you use them. If you don’t want to see these zero-width characters in your pages, you can choose to filter them completely, but this can cause some language-specific typography problems. So be careful with these invisible characters.

I’ll leave you a little Easter egg at the end

“I ‏ ‏ ‏ β€Ž ‏ ‏ ‏ ‏ ‏ β€Ž ‏ ‏ ‏ β€Ž β€Ž ‏ ‏ ‏ ‏ β€Ž ‏ ‏ ‏ ‏ β€Ž β€Ž β€Ž β€Ž ‏ ‏ ‏ β€Ž ‏ β€Ž β€Ž ‏ β€Ž t ‘s not who I am underneath, but what I do that defines me.” – Bruce Wayne

reference

Be careful what you copy: Invisibly inserting usernames into text with Zero-Width Characters by umpox