Let’s start with two scenarios you might encounter on a daily basis:

Scenario 1: You browse a product on the website and learn about the product information, but you do not place an order or even log in. Two days later, I visited other websites on the same computer and found many ads for similar products.

Scene 2: In a blog, you have more than one trumpet (water army), the presence of these small is to brush a post heat or use public opinion to guide, or pure trading flow, even if you cleared when switching accounts local cache, cookies, restart the router using VPN to operate, even you think you be careful enough, And try to make it as authentic as possible, but managers may still know it’s the same person operating and be discouraged.

If you encounter a scenario like the one above, it’s time to consider whether browser fingerprints are at work.

What is a browser fingerprint

A browser fingerprint is a way of tracking a Web browser through the configuration and Settings information that is visible to a Web site. A browser fingerprint is personally identifiable, just like a fingerprint on your hand, but at this stage it identifies the browser.

Fingerprints on the human hand are unique because each fingerprint has a unique pattern, which is formed by uneven skin. The pattern of each person’s fingerprint makes it unique.

The same goes for a browser fingerprint, where you take the information that the browser is identifiable, you do some math and you get a value, and that value is the browser fingerprint. The identification information can be UA, time zone, geographic location, or the language you use, etc. The information you select determines the accuracy of the browser fingerprint.

Getting a browser fingerprint has no real value to a website, but what is really valuable is the user information that the browser fingerprint corresponds to. As a webmaster, collecting a user’s browser fingerprint and recording the user’s actions is a valuable activity, especially for scenarios where there is no user identity. For example, on A content distribution website, user A likes to browse the content of the secondary element, and this interest is recorded through the browser fingerprint. Then the user can push the information of the secondary element to user A without logging in next time. It’s also a way of delivering content at a time when PCS are so ubiquitous.

For users, it’s a bit of a privacy violation to make a connection between your online behavior and your browser’s fingerprint, especially when it comes to connecting your browser’s fingerprint to real user information. Fortunately, this way is relatively limited privacy infringement for users, abuse of user behavior will overdraw users for the good feelings of the website.

Browser Fingerprint Background

Browser fingerprint tracking technology has entered the 2.5 generation.

  • The first generation is stateful, mainly focusing on the user’s cookie and evercookie, requiring the user to log in to get valid information.
  • In the second generation, the concept of browser fingerprint was introduced. By increasing the characteristic values of the browser, users can be more differentiated, such as UA and browser plug-in information.
  • The third generation has already focused on people and established eigenvalues or even models for users by collecting users’ behaviors and habits, which can realize real tracking technology. At present, the implementation of this part is relatively complicated and is still being explored.

It is currently in generation 2.5 because the problem now needs to be solved is how to solve the problem of cross-browser fingerprint recognition. We will talk about the achievements in this area later.

Fingerprint acquisition

Entropy is the average amount of information contained in each received message. The higher the entropy, the more information can be transmitted, and the lower the entropy, the less information can be transmitted.

Browser fingerprint is integrated by many browser feature information, and the entropy of feature value is also different.

Click here to view your browser fingerprint ID and basic information.

Browser fingerprints can also be divided into ordinary fingerprints and advanced fingerprints. Ordinary fingerprints can be understood as parts that are easy to discover and modify, such as HTTP headers

{
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml; Q = 0.9, image/webp image/apng, * / *; Q = 0.8, application/signed - exchange; v=b3"."Accept-Encoding": "gzip, deflate, br"."Accept-Language": "zh-CN,zh; Q = 0.9, en. Q = 0.8"."Host": "httpbin.org"."Sec-Fetch-Mode": "navigate"."Sec-Fetch-Site": "none"."Sec-Fetch-User": "? 1"."Upgrade-Insecure-Requests": "1"."User-Agent": "Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36"}}Copy the code

You can click on this site to view your HTTP header information.

The accept-language and user-agent headers of the browser are used to retrieve the Language of the browser. This HTTP header entity may be generated by the Language of your current operating system or the Language of the browser setting. This header may not be accurate, as some sites will simply ignore this header and determine the language of the page based on the user’s IP address.

User-agent contains browser and operating system information, for example, I am currently using MacOS and chrome version 77. If the UA is deliberately forged in the header, the web page can also get the real UA through navigator. UserAgent.

Other basic information, such as IP, physical address, geographical location, etc., can also be obtained:

here


It is possible to obtain browser feature information in other ways than HTTP fingerprints, and this document presents some possible eigenvalues

  • User agent string for each browser
  • HTTP ACCEPT header sent by the browser
  • Screen resolution and color depth
  • The system is set to the time zone
  • Browser extensions/plug-ins installed in the browser, such as Quicktime, Flash, Java or Acrobat, and versions of these plug-ins
  • Fonts installed on a computer, reported by Flash or Java.
  • Whether the browser executes JavaScript scripts
  • Whether the browser can plant cookies and “super cookies”
  • The hash of the image generated by the Canvas fingerprint
  • Hash the image generated by WebGL fingerprint
  • Is the browser set to Do Not Track?
  • System platforms (e.g. Win32, Linux x86)
  • System languages (e.g. Cn, en-us)
  • Whether the browser supports touch screen

Once you have these values, you can perform some calculations to get the specific entropy of the browser fingerprint and the uUID of the browser. Calculation method.

The following figure shows the information entropy, repetition probability and specific values of several eigenvalues:

Combining the above fingerprint information can greatly reduce the collision rate and improve the accuracy of the client UUID. Fingerprint also has weight, and some characteristic values with higher entropy will have greater weight:

Variable Entropy (bits)
user agent 10.0
plugins 15.4
fonts 13.9
video 4.83
supercookies 6.09
timezone 3.04
cookies enabled 0.353

The information described by ordinary fingerprints is still not unique enough, after all, there are still a lot of MacOS users in Shenzhen. Advanced fingerprints narrow that down even further, almost directly identifying a unique browser.

Canvas prints

Canvas is the dynamic drawing tag in HTML5, which can also be used to generate images or process images. Even if Canvas is used to draw the same element, due to the difference of system, different font rendering engines, anti-aliasing, sub-pixel rendering and other algorithms are also different. When Canvas converts the same text into pictures, the results obtained are also different. The process is as follows

function getCanvasFingerprint () {
    var canvas = document.getElementById("anchor-uuid");
    var context = canvas.getContext("2d");
    context.font = "18pt Arial";
    context.textBaseline = "top";
    context.fillText("Hello, user.".2.2);
    return canvas.toDataURL("image/jpeg");
}
Copy the code

Render some text on the canvas and convert it using toDataURL to get the same value even with private mode turned on

here
Uniqueness

AudioContex fingerprint

The AudioContext fingerprint is similar to Canvas, which generates different audio outputs based on hardware or software differences, and then calculates different hashes to serve as marks. Of course, the audio is not played directly in the browser, just need to get the processing data before playing. audiofingerprint.openwpm.com/

WebRTC

WebRTC (Web Real Time Communication) enables browsers to communicate audio and video in Real Time. It provides three major apis to enable JS to obtain and exchange audio and video data in Real Time. MediaStream, RTCPeerConnection, and RTCDataChannel. Of course, if you want to use WebRTC to obtain communication capability, the user’s real IP must be exposed (NAT penetration), so RTCPeerConnection provides such API, directly use JS to get the user’s IP address.

Cross-browser fingerprint

All the browser fingerprints mentioned above were obtained from the same browser. However, many characteristic values are unstable. For example, UA and Canvas fingerprints may be opened differently in different browsers of the same device. The same browser fingerprint algorithm will not work in different browsers (by different browsers, I mean different browsers on the same device).

A cross-browser fingerprint is a stable browser feature that achieves the same or approximate value across browsers.

Cross-browser fingerprints have also been studied

There is a table like this in this paper

The information entropy and stability of single browser and cross-browser feature values are enumerated, and the stability of Cavas fingerprint is only 8.17%.

Conventional eigenvalues are difficult to maintain high stability with enough information.

Task(a)~Task(R), List of Fonts (JS), TimeZone, CPU Vritual cores,

Task(A)~Task(R), which is a graphics card Rendering Task and Rendering Tasks. Task (a) Texture, for example, tests the Texture functionality of a regular fragment shader by rendering a random pixel of three primary color values. The fragment shader needs to insert points into the Texture in order to map the Texture to every point on the model. This insertion algorithm is inconsistent across different graphics cards. If the texture is more variable, the difference is more obvious, and we can record this difference to differentiate the graphics card.

List of fonts (JS), which is used to get information about fonts supported on the page. There are two ways to get supported fonts on a page, Flash and JS, and Flash is now out of the picture. List of Fonts is the value of js to get the supported fonts on the page and how to draw the fonts. It is to measure the fill size of text HTML elements of different fonts to distinguish them from other devices.

The TimeZone should be the same on the same device.

CPU Vritual cores is the CPU kernel number, the simplest method is through a navigator. HardwareConcurrency to get.

Although this API is not supported on older browsers, it is also available via polyfill. The implementation principle is roughly based on the ability of the Web Worker to monitor the time of the payload, and the number of cores can be obtained when the computation amount reaches the maximum concurrency of the hardware (a little hard core).

How to prevent

If you don’t have enough expertise or change your browser information very frequently, it’s almost 100% possible to spot a user through your browser fingerprint (user behavior), but that’s not all bad.

  • The disclosure of privacy is very one-sided, can only be said to reveal part of the user’s browsing behavior.
  • The value is not enough. The user behavior does not correspond with the actual account or specific person, resulting in limited value.
  • Beneficial use, the use of browser fingerprint can isolate part of the black production users, to prevent brushing tickets or some malicious behavior.

But even so, there are a few things you can do to prevent browser fingerprints.

Do Not Track

You can declare a flag in the HTTP header that says “DNT” means “Do Not Track”, and a value of 1 means Do Not Track my web behavior, and 0 means yes. Even if I don’t have a cookie, I can use this flag to tell the server that I don’t want to be tracked and don’t record my behavior.

The bad news is that most sites currently don’t follow this convention and completely ignore the “Do Not Track” signal.

EFF offers a tool called Privacy Badger, a browser-add-on AD blocker that whitelists ads for companies that adhere to this agreement, thereby incentivizing more companies to comply with “Do Not Track” in order to fully display ads.

Personally, I think this is a good approach. If users use this tool, the website will choose the interests of both sides before tracking user behavior, thus reducing the risk of user privacy disclosure.

More information about Privacy Badger can be viewed here.

Tor Browser

From what we’ve learned about browser fingerprints, it’s not hard to see that the more features your browser has, the easier it is to track. If, on the other hand, you want to intentionally hide or alter certain browser features, congratulations, your browser may have a unique browser fingerprint that separates you from other users without having to calculate it.

Therefore, the effective method is to popularize the eigenvalue as far as possible. For example, the most popular collocation in the market is Window 10 + Chrome, then changing UA to this combination is an effective method, and at the same time, the website should try to avoid obtaining the eigenvalue with very high information entropy, such as Canvas fingerprint.

The Tor browser has done a lot of work on this to prevent them from being used to track Tor users, and in response to Panopticlick and other fingerprint experiments, the Tor browser now includes patches, To prevent font fingerprints (by limiting the fonts that websites can use) and Canvas fingerprints (by detecting reads of HTML5 Canvas objects and requiring user approval), such as the code above to obtain Canvas fingerprints, Tor will pop up with the following warning

You can also configure the Tor browser to actively block JavaScript.

Taken together, these measures make the Tor browser a powerful defense against fingerprints. But the safety of the browser comes at the expense of speed, and using the Tor browser to access pages is much slower than popular browsers. Those of you who are interested should try Tor Browser

Disable the JS

This is a more violent approach, and outright disabling JavaScript is a great defense against browser fingerprint tracking, but it will result in a large number of pages being rendered unusable.

And unfortunately, even if JS is disabled, you can use CSS to take browser information, for example:

@media(device-width: 1080px) { body { background: url("https://example.org/1080.png"); }}Copy the code

You can look at the request log for the image 1080. PNG on the server and see which users have 1080px screens. In Mozilla Firefox, there was even a CSS query that could directly query the Windows system version and Windows theme. This has now been fixed.

The resources

  • (Cross-)Browser Fingerprinting via OS and Hardware Level Features
  • 2.5 Fingerprint tracking technology – Cross-browser fingerprint recognition
  • navigator.hardwareconcurrency
  • panopticlick