If you feel ok, please like more, encourage me to write more wonderful article 🙏.
If you have any questions, feel free to comment in the comments section
TL; DR
- preface
- Thinking analytical
- Introduction to relevant knowledge points
- Code parsing
preface
With the advent of the digital era, it can be said that all life is based on the network. And in some specific scenes, when it comes to private information, there will always be people with malicious intentions to fabricate facts and fabricate rumors, which will become an embarrassing situation of three people becoming tigers.
So a technology is needed to identify the publisher of private text. To find the real killer.
Thinking analytical
We go back to the source of the text, and the text is an ever-changing thing on the page. Therefore, we need to redundancy the information of the operator that can identify the copied text into the text, but cannot display it.
In other words, we need to quietly combine the information that represents the operator with the text to be copied.
First, let’s be clear: in a computer, whatever variables are defined in any development language are ultimately stored in memory in binary form. At the same time, zeros and ones in binary are placeholders for storing certain information. That is, we also need to replace the information in the binary with a specific language and show and hide it.
Therefore, we can roughly classify the watermarking process design ideas as follows
The new watermark
- Binary user information
- Binary information display hidden
- User information and text information are redundant
Decoding of the watermark
- Obtaining Watermark Information
- transbinarization
- Parsing encrypted information
Introduction to relevant knowledge points
Unicode
What is Unicode
Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.
Unicode zero-width Spaces and zero-width hyphens and zero-width non-hyphens
There are some numbers in Unicode that cannot be displayed on a page, but they are used in certain scenarios. For example, the \n used to specify text newlines is U+000A in Unicode.
Zero-width Spaces and zero-width hyphens and zero-width non – hyphens are similar in that they cannot be displayed on a page, but have practical meaning
Code parsing
The new watermark
Binary user information
const zeroPad = num= > '00000000'.slice(String(num).length) + num;
const textToBinary = username= > (
username.split(' ').map(char= >
zeroPad(char.charCodeAt(0).toString(2))).join(' '));Copy the code
Here are a few points to explain: as mentioned above, only binary processing of user information is required. The normal way of thinking would just be to char.charcodeat (0).tostring (2) the text.
However, when a character is wrapped in binary, redundant zeros are automatically removed if the binary starts with zeros. Const zeroPad = num => ‘00000000’. Slice (String(num).length) + num;
For example, the binary of A is 1100001, which is not enough for 8 bits. Therefore, it is necessary to fill it as 01100001, for the convenience of decoding.
Information hiding
const binaryToZeroWidth = binary= > (
binary.split(' ').map((binaryNum) = > {
const num = parseInt(binaryNum, 10);
if (num === 1) {
return '\u200b'; // Zero-width space
} else if (num === 0) {
return '\u200C'; // zero width non - connecter
}
return '\u200D'; // zero width concatenation
}).join('\uFEFF') // Zero width non-newline space
);
Copy the code
User information and text information are redundant
let encryptionText = 'I am a lovely man'+ binaryToZeroWidth('The North and the South')
Copy the code
Decoding of the watermark
Obtaining Watermark Information
const zeroWidthChar = [
'\u200B'.// Zero-width space
'\u200C'.// zero width non - connecter
'\u200D'.// zero width concatenation
'\uFEFF'.// Zero width non-newline space
]
const charArr = string.match(zeroWidthCharReg);
const binaryArr = charArr.join(' ').split(zeroWidthChar[2]);// The encrypted message is separated by zero-width hyphens
const mark = binaryArr.map(binary= > {
const binaryString = binary.split(' ').map(b= > zeroWidthChar.indexOf(b)).join(' ');
const utf16 = parseInt(binaryString, 2);
return String.fromCharCode(utf16)
}).join(' ')
Copy the code
Binary encryption information
const zeroWidthToBinary = string= > (
string.split('\uFEFF').map((char) = > { // Zero width non-newline space
if (char === '\u200b') { // Zero-width space
return '1';
} else if (char === '\u200C') { // zero width non - connecter
return '0';
}
return ' '; // Encrypt the message to its own text segmentation
}).join(' '));Copy the code
This operation can be considered the reverse operation of encryption.
Decrypt encrypted information
const binaryToText = string= > (
string.split(' ').map(num= >
String.fromCharCode(parseInt(num, 2))).join(' '));Copy the code
tips
Although what we call zero-width space or zero-width concatenation is ostensibly not displayed, if the code is used to query the length of the string, its length is 0 ‘\u200A’. Length ===1 //true