If technology is guilty, man is guilty. — Vo 镃 Kishold
We’ve all had the experience of browsing for electric toothbrushes on one site or another while surfing the Web, only to see similar products advertised in the corners of other websites. This allows us to feel the sense of immersion surrounded by big data, but at the same time reflect on the extreme, do we still have privacy? That’s a question we’ll discuss later, but let’s talk about why this is the case.
First, browser fingerprint
Yes, you read that right. Just like people have fingerprints, Web clients have their own fingerprints. After comprehensive analysis and calculation of these information, the client can be uniquely identified, and then lock, track, understand netizens’ behavior and privacy data, so as to achieve accurate advertising, anti-wool and other effects.
1. Basic fingerprints
A basic fingerprint is a characteristic identifier that any browser has, Hardware type (Apple), operating system (Mac OS), User Agent (User Agent), system font, language, screen resolution, browser plug-ins (Flash, Silverlight, Java, Etc), Browser extensions, Browser Settings (do-not-track, etc), time zone Offset (Browser GMT Offset) and many other information. These fingerprint information is “similar” to human height and age, and has a high probability of conflict, so it can only be used as auxiliary identification.
2. Behavioral fingerprints
Taking e-commerce websites as an example, we can browse any product, the length of time we stay on the page, the category we often buy, the size of the product and so on, all of which can be recorded in the local cookie as behavioral fingerprint, which can help advertisers to place advertisements accurately, and can also help e-commerce websites to train recommendation models to make accurate recommendations for you. So “big data may know more about you than you know about yourself” isn’t just talk.
3. Advanced fingerprints
Basic fingerprints like people’s appearance, height, weight, gender, it is difficult to from the Angle of the eye to distinguish between the browser and then advanced the fingerprint for the browser, like DNA average precision (don’t be afraid, there is a limit) here will focus on the current widespread use of three senior fingerprints: canvas prints/WebGL fingerprint/Audio Audio fingerprint.
1) Canvas fingerprint
Canvas is a kind of dynamic drawing label in HTML5, which can be used to generate and even process advanced images. In September 2014, ProPublica reported that the new Canvas fingerprint tracking is being used in “the White House, down to YouPorn” and many other websites, showing its advanced fingerprint status. The generation process of Canvas fingerprint is roughly as follows:
- Draws the specified pattern using the canvas
- use
canvas.toDataURL()
Method to obtain the base64 encoding of the image content - it
CRC check code
As a unique identifier (for PNG images, it is divided into chunks, and the last chunk is 32-bit CRC check), the introduction of Canvas principle in many blogs is basically one sentence:The same drawing operation of HTML5 Canvas element will produce different picture contents on different operating systems and browsers. In the image format, different browsers use different graphics processing engines, different image export options, different default compression levels, etc. At the pixel level, each operating system uses different Settings and algorithms for anti-aliasing and sub-pixel rendering operations. Even with the same drawing operation, the CRC test for the resulting picture data is not the same.
But it still doesn’t seem intuitive, right?
Let’s take a little space to explain it as colloquially as possible, for example:
The fonts we see on our browsers have been tweaked to make them look the same to the naked eye, which is a special instruction that lets a rendering algorithm draw text on the screen. Because words on computers are not stored as images but as mathematical formulas.
When the computer draws the mathematical formula into text on the screen, it needs to render the mathematical formula into an actual image composed of pixels, which is generated according to screen resolution, pixel size and other parameters, so the results may differ. At this point, different operating systems/browsers have different fine-tuning algorithms that make the same letter M look the same to the naked eye, but when we zoom in at the pixel level, we see subtle differences, as shown below.
Another example is anti-aliasing (edge softening technique). Anti-aliasing is to smooth the transition between two objects that are in sharp contrast, so that the pixels between the two objects are not aesthetically pleasing. Different shades of grey pixels are added to the edges of black and white images as buffers to achieve smooth effects. Different video card drivers run anti-aliasing, the results will be different. These differences are also indistinguishable to the naked eye, and if you compare the pixels carefully you will see a slight difference in color. As before, computers can easily recognize differences that are invisible to the naked eye. That’s why different computers produce different fingerprints.
2) WebGL fingerprints
With the understanding basis of Canvas fingerprint, WebGL will have a better understanding. Basically, it is the same principle, except that it will bind the drawn 3D image content to some WebGL attribute values (such as the graphics card provider and model used for rendering the image, compression level, etc.) and splicing them into a long string. The hash then retrieves a string that has reduced the amount of information but still retains the bad information as a WebGL fingerprint
3) AudioContext fingerprint
HTML5 provides the Audio API for JavaScript programming so that developers can directly operate the original Audio stream data in the code, generate, process and reconstruct it at will, such as improving the tone, changing the tone, Audio segmentation and other operations, and it can even be called the Web version of Adobe Audition. Its general principle is as follows, and there are two methods:
- generate
Specific audio information stream
After a series of operations, calculateSHA value
As a fingerprint, the audio is wiped before being output to the audio device without the user noticing. - generate
Specific audio information stream
, direct dynamic compression,MD5 hash processing
After the audio fingerprint is obtained
The reason why audio fingerprints differ between devices and browsers is that the subtle differences in hardware and software of hosts and browsers lead to differences in audio signal processing. The same browser on the same machine generates the same audio output, while the audio output generated by different machines or browsers varies.
It can be seen that these three advanced fingerprints all take advantage of hardware or software differences, some generate images and some generate audio, and then calculate different hashes as reference fingerprints.
4. Integrated fingerprints
In the Internet world, the collision of various fingerprints everywhere, especially like MAC production machine, the same batches of the same type of MAC, use the same browser is likely to produce the same senior “fingerprint”, has certain repetition rate, this time will need to combine all of the above fingerprints, through analysis, calculate the final comprehensive fingerprints as a judge. This can greatly reduce the collision rate and greatly improve the accuracy of client uniqueness identification.
Second, browser fingerprints should be used
After introducing the browser fingerprint, we should also have a certain understanding of the way of accurate advertising before said, let’s talk about our concerns about the browser fingerprint
-
Users will be very mind was no sense of collecting personal information, this point, as more and more people value of privacy, each big company browser vendors, even the phone makers do the restrictions and conventions, we understand the name/PC’s MAC address/location/personal privacy, such as phone number, unless the user authorization, third party can’t collect, All the fingerprints mentioned above are basically non-personal.
-
Under the authorization of users, browser fingerprint can provide users with accurate advertising and recommendations
-
For some businesses, browser fingerprints can serve as an important indicator of the uniqueness of their devices, enabling them to pull wool, bind a unique account to the same device, and so on, to reduce the company’s losses.
So, from my personal point of view, no matter what technology it is, it can be put into the wrong use area or scenario, the key is whether we set up and follow the rules of public acceptance.
We can reject things like browser fingerprints by setting up a number of browser permissions, and we can enjoy the convenience of such technology in our lives.
Companies and manufacturers can collect users’ non-private information to obtain the unique identification of the device, and they can also collect information through users’ authorization to build a better recommendation model to bring meticulous service to people.
Such a harmonious vision is built on the word “degree”. We can see that this is being revised step by step (as browser fingerprints are able to capture less and less unauthorized information), so there’s reason to believe that the privacy pie will eventually turn out to be good for everyone.
This is my understanding of browser fingerprinting, if there is any improper, welcome to correct.
The resources
- Canvas,WebGL and AudioContext fingerprint principles
- Web client tracing – Browser fingerprint tracing