If you feel ok, please like more, encourage me to write more wonderful article 🙏.

If you have any questions, feel free to comment in the comments section

TL; DR

  • preface
  • Thinking analytical
  • Introduction to relevant knowledge points
  • Code parsing

preface

With the advent of the digital era, it can be said that all life is based on the network. And in some specific scenes, when it comes to private information, there will always be people with malicious intentions to fabricate facts and fabricate rumors, which will become an embarrassing situation of three people becoming tigers.

So a technology is needed to identify the publisher of private text. To find the real killer.

Thinking analytical

We go back to the source of the text, and the text is an ever-changing thing on the page. Therefore, we need to redundancy the information of the operator that can identify the copied text into the text, but cannot display it.

In other words, we need to quietly combine the information that represents the operator with the text to be copied.

First, let’s be clear: in a computer, whatever variables are defined in any development language are ultimately stored in memory in binary form. At the same time, zeros and ones in binary are placeholders for storing certain information. That is, we also need to replace the information in the binary with a specific language and show and hide it.

Therefore, we can roughly classify the watermarking process design ideas as follows

The new watermark

  • Binary user information
  • Binary information display hidden
  • User information and text information are redundant

Decoding of the watermark

  • Obtaining Watermark Information
  • transbinarization
  • Parsing encrypted information

Introduction to relevant knowledge points

Unicode

What is Unicode

Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.

Unicode zero-width Spaces and zero-width hyphens and zero-width non-hyphens

There are some numbers in Unicode that cannot be displayed on a page, but they are used in certain scenarios. For example, the \n used to specify text newlines is U+000A in Unicode.

Zero-width Spaces and zero-width hyphens and zero-width non – hyphens are similar in that they cannot be displayed on a page, but have practical meaning

Code parsing

The new watermark

Binary user information

const zeroPad = num= > '00000000'.slice(String(num).length) + num;
const textToBinary = username= > (
  username.split(' ').map(char= >
    zeroPad(char.charCodeAt(0).toString(2))).join(' '));Copy the code

Here are a few points to explain: as mentioned above, only binary processing of user information is required. The normal way of thinking would just be to char.charcodeat (0).tostring (2) the text.

However, when a character is wrapped in binary, redundant zeros are automatically removed if the binary starts with zeros. Const zeroPad = num => ‘00000000’. Slice (String(num).length) + num;

For example, the binary of A is 1100001, which is not enough for 8 bits. Therefore, it is necessary to fill it as 01100001, for the convenience of decoding.

Information hiding


const binaryToZeroWidth = binary= > (
  binary.split(' ').map((binaryNum) = > {
    const num = parseInt(binaryNum, 10);
    if (num === 1) {
      return '\u200b'; // Zero-width space
    } else if (num === 0) {
      return '\u200C'; // zero width non - connecter
    }
    return '\u200D'; // zero width concatenation
  }).join('\uFEFF') // Zero width non-newline space
);
Copy the code

User information and text information are redundant

let encryptionText = 'I am a lovely man'+ binaryToZeroWidth('The North and the South')
Copy the code

Decoding of the watermark

Obtaining Watermark Information

const zeroWidthChar = [
  '\u200B'.// Zero-width space
  '\u200C'.// zero width non - connecter
  '\u200D'.// zero width concatenation
  '\uFEFF'.// Zero width non-newline space
]
const charArr = string.match(zeroWidthCharReg);
const binaryArr = charArr.join(' ').split(zeroWidthChar[2]);// The encrypted message is separated by zero-width hyphens
const mark = binaryArr.map(binary= > {
const binaryString = binary.split(' ').map(b= > zeroWidthChar.indexOf(b)).join(' ');
    const utf16 = parseInt(binaryString, 2);
    return String.fromCharCode(utf16)
  }).join(' ')
Copy the code

Binary encryption information

const zeroWidthToBinary = string= > (
  string.split('\uFEFF').map((char) = > { // Zero width non-newline space
    if (char === '\u200b') { // Zero-width space
      return '1';
    } else if (char === '\u200C') {  // zero width non - connecter
      return '0';
    }
    return ' '; // Encrypt the message to its own text segmentation
  }).join(' '));Copy the code

This operation can be considered the reverse operation of encryption.

Decrypt encrypted information

const binaryToText = string= > (
  string.split(' ').map(num= >
    String.fromCharCode(parseInt(num, 2))).join(' '));Copy the code

tips

Although what we call zero-width space or zero-width concatenation is ostensibly not displayed, if the code is used to query the length of the string, its length is 0 ‘\u200A’. Length ===1 //true