background

With the rapid development of the Internet, Web services have shown a trend of rapid growth. While using Web services, people also face network security risks. According to the “White Paper on China’s Network Security Industry” released by China Academy of Information and Communication Technology, China’s network security industry is showing a trend of rapid growth, and the industry scale is expected to reach 170.2 billion yuan in 2020.

Among them, Web security also occupies a certain proportion, Web site has become an important battlefield of network attack and defense.

Browser fingerprint plays a more and more important role in the field of active defense of Web security. This paper introduces the browser fingerprint from three aspects: overview of browser fingerprint, fingerprint recognition principle and application scenarios.

Overview of Browser Fingerprints

Proposed in 2012 by Peter Eckersley, chief scientist at the Electronic Frontier Foundation (EFF), browser fingerprints take advantage of properties freely transmitted by browsers to generate identify-like strings like human fingerprints. So what are the common browser fingerprints? You can refer to the data in the following table:

  • Fingerprint factor: indicates the public attributes of the browser, such asuserAgentCan be achieved bynavigator.userAgentTo obtain;colorDepthCan be achieved byscreen.colorDepthTo obtain.

  • Stability: Refresh the browser without changing the value of the fingerprint factor. Such ascolorDepthIt represents the color depth of the screen, which is 24 in Chrome and remains the same after a browser refresh. So it’sThe stability of theFingerprint factor.
  • Independence: Indicates that the value of fingerprint factor does not change when different browsers are used on the same device. Such asdevicePixelRatioIt is the ratio of the physical pixel resolution of the display device to the CSS pixel resolution. The value is the same for Google Chrome, Firefox, Edge, Internet Explorer, Opera, and Safari on the same device. As a result,devicePixelRatioindependentFingerprint factor.
Fingerprint factor The stability of independence
userAgent stable no
language stable Yes (most of the time)
colorDepth stable is
deviceMemory stable is
pixelRatio stable is
hardwareConcurrency Stable (but not supported by Internet Explorer) Yes (but not supported by Internet Explorer)
screenResolution stable is
availableScreenResolution stable is
timezoneOffset stable is
timezone stable is
sessionStorage stable no
localStorage stable no
indexedDb stable no
addBehavior stable no
openDatabase stable no
cpuClass stable is
platform stable Yes (most of the time)
doNotTrack stable no
plugins To be determined no
canvas Stability (most of the time) No (Verified)
webgl Stability (most of the time) No (Verified)
adBlock Stable (depending on time of use) no
hasLiedLanguages stable is
hasLiedResolution stable is
hasLiedOs stable is
hasLiedBrowser stable is
touchSupport stable is
fonts Stability (most of the time) No (most of the time)
audio The unknown no

Table fingerprint factor data source: [Browser Independent Components

] (github.com/fingerprint…). & Stable components

We can obtain the corresponding values of these fingerprint factors through JavaScript code. The browser fingerprint composed of stable and independent fingerprint factors makes device identification possible. That is, the browser fingerprint is used to obtain the fingerprint of the hardware device, such as a personal browser or PC.

What, you might wonder, can these fingerprint factors be used for? How does it associate with the device? Let’s take a look at the principles behind browser fingerprint recognition.

Recognition principle

The information entropy

Above we have introduced common browser fingerprint factors, but how to measure these freely transferable fingerprint factors in the browser? Those of you who are familiar with information theory know that we can measure information by entropy, and the higher the entropy, the more information can be transmitted, and the lower the entropy, the less information can be transmitted. Therefore, information entropy can be used as the criterion of browser fingerprint identification degree. For example, for a discrete random variable X, its entropy H(X) is:


H ( X ) = x p ( x ) log 2 p ( x ) H(X) = -\sum_x{p(x)}\log_2{p(x)}

Entropy of a single fingerprint factor

In the above formula, we use a logarithmic function base 2 in bits. Simply put, the amount of information in a fingerprint factor can be quantified by it. For example 🌰 : We take the fingerprint factor doNotTrack as an example, and its value results can be divided into two types:

  • Enable Settings can be marked as 1, disable Settings can be marked as 0
  • Assume that the statistics result of the user’s visit to the site is setdoNotTrackIs 10% for the overview and 90% for the unset overview. thendoNotTrackThe entropy corresponding to this fingerprint factor is:


H ( X ) = 1 10 log 2 1 10 9 10 log 2 9 10 = 0.469 b i t H (X) = {\ frac {1} {10}} * \ log_2 ^ {\ frac {1} {10}} – {\ frac {9} {10}} * \ log_2 ^ {\ frac {9} {10}} = 0.469 bit

Note: Click on the basics of information theory for time-travel review.

Probability of multiple fingerprint factors

Knowing how to calculate the information entropy of a single fingerprint factor, we can further combine the information entropy of multiple fingerprint factors of the browser. According to Peter Eckersley’s paper How Unique Is Your Web Browser, The browser fingerprint is generated using eight fingerprint factors including userAgent, Fonts, screenResolution and plugins:

The statistical results show that these 8 fingerprint factors contain at least 18.1 bit of information entropy, and the self-information calculation formula is as follows:


I ( e ) = log 2 p ( x ) I(e) = -\log_2{p(x)}

We can deduce the probability required for the same fingerprint to appear:


p ( x ) = 2 I ( e ) p(x) = 2^{-I(e)}

Through calculation, it can be concluded that a random browser, 286,777 browsers will appear in the same browser fingerprint. This shows that the browser fingerprint has a very high identification, and with the increase of fingerprint factor dimension, the probability of the same browser fingerprint will be lower and lower, and its identification accuracy will be higher and higher.

Fingerprint duplication problem

At present, the cloud server and virtual host has been relatively popular, at any time can clone a number of exactly the same virtual system and equipment, similar to ghost system, they are in the initial state of the fingerprint probability is the same, then we how to identify these “factory Settings” the same equipment?

Above we propose a fingerprint identification is based on static rules of recognition, namely identification only once, but the truth is a equipment of fingerprint is will change over time, it is completely associated with the habit of users, for example: if you give yourself today browser installed a plug-in, the browser then fingerprints will change. Therefore, the association mechanism based on static rules and dynamic tracking can realize long-term tracking of device fingerprint evolution. For those interested, consider the results of fP-Stalker: Tracking Browser Fingerprint Evolutions: The average time a Browser can be tracked is over 54.48 days. In the process of tracking, reclassification of fingerprint data sets can effectively avoid such problems.

summary

To sum up, we can calculate the entropy value corresponding to the browser fingerprint factor, and then calculate the probability of its occurrence in combination with the self-information, so as to infer the device based on the fingerprint value and its probability, and understand the generation principle of browser fingerprint. So you might want to know what the scenarios are, and we’ll go on.

Application scenarios

Common application scenarios of browser fingerprints are as follows:

The active defense of a malicious reptile

At present, for Web sites, there are various automatic crawlers on the network for data collection, such as ticket brushing, comments, crawling private data, etc. At the present stage, the main defense measures include passive detection and defense, IP address detection and blockade, browser fingerprint & active defense, etc.

Scanner recognition and interception

In addition to Web crawler tools, there are also various professional Web vulnerability scanning tools, such as AWVS, APPScan, Xray, etc. While scanner actively identifies Web vulnerabilities, it will also bring potential security problems to online Web services. For the interception of the scanner, according to the specificity of the browser fingerprint, when it scans the designated site, identification and interception.

Tracing and tracing Web attacks

A variety of scripts are running on our browsers. Using browser scripts to trace the source of Web attacks has absolute advantages. Attackers can be traced and traced through the extraction and association of browser fingerprint feature information.

Personalized AD push

When you browse or search for related products on the Internet, do you often appear related to the advertising page? Chances are you’re being tracked because your screen resolution, time zone and emoji set are all exposed online. It can even track you while you’re in traceless browsing mode, if you’re interested, check it out here.

The above is part of the application scenario of browser fingerprint, from which we can see: browser fingerprint can be pushed by advertising, but also can be the criterion of defense identification.

conclusion

This article mainly from the browser fingerprint overview, fingerprint recognition principle and application scenarios three aspects to lead you to understand the browser fingerprint. This paper introduces the common browser fingerprint, analyzes the relationship between browser fingerprint and device, and briefly introduces some application scenarios of browser fingerprint. In the next article, we’ll design a simple fingerprint tracking model to see what a browser fingerprint can do in action. Stay tuned.

Refer to the link

  • xprize
  • fingerprintjs
  • Fundamentals of information theory
  • How do trackers work?
  • New findings based on half a million browser fingerprints
  • How Unique Is Your Web Browser?
  • Learn how identifiable you are on the Internet
  • [Beauty and the Beast: Diverting modern web browsers

To build unique browser fingerprints](HAL.inria.fr/HAL-0128547…)