Apple recently made a major update to its Safari browser that completely disables third-party cookies, meaning that advertisers or websites can’t track your privacy by default. Microsoft and Mozilla have also taken measures to disable third-party cookies, but due to the small market share of these browsers, they have not brought a huge impact on the market.

From 2017 to the end of 2019, Google has faced more than 9.3 billion euros in fines, one of which was for violating users’ data privacy. Under intense pressure, Google Chrome’s official team recently announced that in order to improve user privacy and security, third-party cookies will be completely disabled in the next two years.

In the case that third-party cookies cannot be written at all, it will have a huge impact on the front-end data reading and writing mode, and even the entire advertising industry.

The meaning of the Cookie


As we all know, HTTP is a stateless protocol. If you send multiple requests to the server from the same client, the server will not know that these requests come from the same client.

This is why HTTP is so widely used. If it’s a stateful protocol, you have to be connected to the server all the time, so if the connection is accidentally dropped, the entire session is lost, and you have to start all over again. Stateless protocols, on the other hand, make the session independent of the connection itself, so that even if the connection is broken, the state of the session is not seriously harmed and the connection itself is not needed to maintain the session.

If the HTTP protocol is used to access a static file, that there is no problem, but if you want to provide better service for the masses of users, the server will need to know every specific requests from which the user, such as in you shop at taobao, you only need to log in once when you initiate a purchase request, the server is already know you log in, You will no longer be allowed to log in.

So HTTP takes a small piece of the browser’s storage, storing some “state” of the current visiting user, and then each time you make an HTTP request, it carries that state with it, letting the server know who you are. Cookies are meant to solve this problem by giving stateless HTTP a little bit of memory.

However, as soon as cookies appeared, they became a sharp tool for major advertising and shopping websites to pry into users’ privacy. They use third-party cookies to continuously obtain your data. Then what third-party cookies?

Third party cookies

Tmall.com. However, when you open the console, you will see that not all cookies are in the.tmall.com field. There are also many cookies in other fields. All of these cookies that are not in the current domain are third-party cookies that you may never access, but they have secretly identified your information through these third-party cookies and sent your personal information.


Cookies in the field of.tmall.com all belong to first-party cookies, so why do we need third-party cookies? Then open Taobao.com, you will find that you do not need to log in any more, because Taobao and Tmall are all products of Ali. Ali provides them with a unified login service. At the same time, your login information will also be stored in the domain of the unified login service, so the Cookie stored in this domain becomes a three-party Cookie.

When we open Safari, which has completely disabled third-party cookies, only cookies in the.tmall.com field are left.


At this time, you will find that even if you have logged in tmall, and then access Taobao also need to log in, you can not enjoy such a function, and the three-way Cookie is not only so much use, in Web development, the application of three-way Cookie is very wide, let’s look at a few specific application scenarios:

The purpose of the third-party Cookie

Front-end log management


Most Web sites refer to third-party SDKS for front-end exceptions or performance monitoring, which upload the monitored information to their servers through some interface. They typically need to identify each user for troubleshooting or UV statistics, so when you request the site, they may set a Cookie on your site that all subsequent log requests will carry.

Since these third-party SDKS are generally generic services for monitoring, they will definitely have their own domain names, such as log.com, and the cookies they plant under your domain name mysite.com are third-party cookies.

Advertising and marketing genius – Facebook Pixel


In the e-commerce business, tracking traffic, traffic, conversion rate, sales, these are the most concerned about the business. That’s where you can use Facebook Pixel, which is simply a string of JavaScript code that tracks AD conversions, improves audience targeting, and maximizes return on AD dollars.


This code is triggered when a visitor enters a page with a Facebook Pixel. For example, when an item is viewed or added to a shopping cart, Facebook Pixel sends requests to the system to record those actions, which the system can use to further track and optimize.

For a practical example, if you go to a foreign e-commerce site Brava Fabrics, you will find a bunch of third-party cookies written to facebook.com:


I guess this FR should be the cookie used to identify my identity information. Then I click on several pages and find several requests sent by Facebook in the network. The following is one of them:

https://www.facebook.com/tr/?id=382444918612794&ev=PageView&dl=https%3A%2F%2Fbravafabrics.com%2Fcollections%2Fa-moment-of-blis s&rl=https%3A%2F%2Fbravafabrics.com%2F&if=false&ts=1586868288778&sw=1680&sh=1050&ud[ct]=eb045d78d273107348b0300c01d29b75 52 d622abbc6faf81b3ec55359aa9950c & ud [country] = eb045d78d273107348b0300c01d29b7552d622abbc6faf81b3ec55359aa9950c & v = 2.9.15 & r = stable&ec = 0 and 30 place & FBP o = = fb. 1.1586867082370.951509876 & it = 1586868284974 & coo = false&rqm = a GET

Copy the code

If you look at the details, you will find the following main parameters:

dl: https://bravafabrics.com/collections/a-moment-of-bliss
rl: https://bravafabrics.com/
Copy the code

When facebook already know me from https://bravafabrics.com/ to https://bravafabrics.com/collections/a-moment-of-bliss this page, at the same time, This request will carry fr=09wX7ui8MrvCh2SIa.. BdNoGz. F.F 6 r. 0.0. Belanb. AWXCDx this Cookie.


Go to Facebook, and when you log in, Facebook will associate these cookies with your Facebook Id, and then it can analyze your behavior:

  • Someone made a purchase on your site.
  • People sign up for trials or otherwise identify themselves as potential customers on your site.
  • Someone enters their payment information during a purchase on your site.
  • Someone is adding products to your shopping cart on your website.
  • Someone chooses a particular version of a product, such as a certain color.
  • Someone initiated a check, but no payment was made
  • .

All you need to do is copy a JavaScript script from Facebook Pixel onto your page. All of this is based on a tiny Cookie that allows Facebook to associate these actions with its account system.

The ubiquitous MMstat

Let’s take another example in China. Usually, when we search for something on domestic search engines or video websites, and then open shopping websites, we can receive all kinds of recommendations related to your interest. It has become common for the public. The third-party Cookie will collect your age, gender, browsing history and so on to judge your interests and preferences, and then bring you accurate information recommendation.

For example, when we browse Baidu, Youku, Tmall and other websites, we can see several cookies under the field of.mmstat.com

Baidu:

Youku:

Tmall:

When you do a series of operations in Baidu, Youku, Taobao, etc., mmStat.com has quietly sent your personal information to them through the cookies of three parties. Mmstat.com should be the domain name of Alibaba’s big data marketing platform Ali Mom (just a personal guess). On alibaba’s home page, you can see that it claims to be a data gold mine that knows more about consumers and has established an identity identification system for 500 million users. Every search you make, every purchase you make, makes it more accurate, and the next time you get a more accurate recommendation.


Of course, tripartite cookies are just one of many ways to get information about your preferences, but they’re easier and cheaper.

Browser strategy

Recent changes to the Cookie policies of major browsers mean that a complete ban on third-party cookies is not far off:

Firefox, Safari — Disabled by default

In Safari 13.1 and Firefox 79, third-party cookies are disabled by default, but these browsers don’t have a huge impact on the market due to their small market share. Because Ali’s login information is unified under a tripartite Cookie, Taobao at the beginning of the processing method, or even pop a box out, tell the user to manually open the tripartite Cookie:


However, such a way of processing is very low experience for large users. The solution may be to first plant cookies under the current domain, so we have the test results above. Taobao and Tmall need to log in twice.

But tripartite cookies do a lot more than that, so be sure to make changes before Chrome fully disables them.

Chrome – SameSite cookies

Fortunately, because third-party cookies have a great impact on Google’s advertising business, Google did not immediately disable them. Instead, it has been modifying some small policies to restrict third-party cookies, such as SameSite

SameSite is an attribute added to Chrome 51 for browser cookies. SameSite prevents browsers from sending cookies along with cross-site requests. The primary goal is to reduce the risk of cross-source information leakage. It also prevents CSRF attack to some extent.


SameSite, which avoids sending cookies across site requests, has three properties:

Strict

Strict is the strictest protection and prevents the browser from sending cookies to the target site in all cross-site browsing contexts, even when following regular links. Therefore, this setting prevents all CSRF attacks. However, it is too user friendly to allow even ordinary GET requests through.

For a normal site, for example, this means that if an already logged in user follows a link to a site posted on a company discussion forum or E-mail, the site will not receive a Cookie and users will need to log in again to access the site.

However, it is likely that sites with transaction business do not want to link to any transaction pages from an external site, so strict is best for this scenario.

Lax

The default Lax value provides a reasonable balance between security and availability for sites that allow users to reach this site from external links and use existing sessions. The Lax property only prevents cross-domain cookies from being sent using dangerous HTTP methods, such as POST. Also, requests made using JavaScript scripts cannot carry cookies.


For example, if A user clicks on site B at site A (GET request) and site B uses Samesite-cookies=Lax, then the user can log in to site B normally. In contrast, if A user submits A form from Site A to site B (POST request), the user’s request will be blocked because browsers do not allow cookies to be posted from domain A to domain B.

None

Browsers continue to send Cookies under same-site and cross-site requests, case insensitive.

Policy updates

In older browsers, if the SameSite property is not set or not supported by the running browser, it behaves like None, and Cookies are included in any request — including cross-site requests.

However, in Chrome 80+, the default property for SameSite is SameSite=Lax. In other words, if the Cookie does not set the SameSite property, it will be treated as if the SameSite property was set to Lax. If you want to specify that Cookies are sent on both same-site and cross-site requests, you need to explicitly specify SameSite to None. Cookies with SameSite=None must also be marked secure and sent over HTTPS. This means that all requests that use JavaScript scripts to collect user information will not be able to carry tripartite cookies by default.

However, this change doesn’t make a big difference, it just sends a signal to websites, because all you need to do is manually set the Cookie you want to send to None:



What’s really scary is that we won’t be able to directly specify SameSite as None, but the user will have to choose it, which is truly disabled by default.

Chrome has also announced that it will disable tripartite cookies in visitor mode in the next release, Chrome 83, and completely disable tripartite cookies in 2022, by which time it won’t matter if you can specify SameSite as None, Because you can’t write third-party cookies anymore.

When tripartite cookies are banned altogether

Now, let’s imagine what happens when the browser disables third-party cookies and we don’t make any changes:

Front-end Log Exception

One day you may suddenly find that your UV has skyrocketed, but your PV has not changed. It may be that the third party Cookie used by your SDK has been disabled.

The SDK will not be able to write a three-way Cookie to your domain, causing it to return with a new Cookie every time you refresh the page. The back end will receive a signal that these are requests from different users, so they will be counted in the UV. Also, when you troubleshoot problems, you can’t connect the dots, which makes troubleshooting very difficult.

Smart AD recommendations disappear

As mentioned above, advertising services can infer your preferences based on your age, gender and behavior, so as to push accurate advertisements to you. Advertisers who use third-party cookies to track information will not be able to obtain your preferences, so as to recommend the advertisements you are interested in.

At this time, advertisers can only be in your visiting environment for pre-defined advertising, such as you are visiting a pet website, to recommend pet products to you and so on.

At the same time, advertisers may also judge the number of times you read an AD through cookies. Once you read the same AD for many times but there is no conversion, they will stop pushing the AD to you. Or if you’ve already bought the product, you won’t see it again. Without frequency control, you might end up staring at an AD for days on end that you never clicked on, or you might end up seeing an AD for a product you already bought.

Unable to track conversion rate


When you view an AD, the AD places a Cookie in your browser indicating that you have seen it. If you then move into the conversion phase (buy, download, etc.), advertisers need to be able to track every conversion rate they put on your site so they can calculate the impact and optimize their strategy, and if you can no longer track the conversion rate, it’s hard to place ads.

Of course, that’s just assuming you haven’t made any changes. There’s still more than a year to go before the three-party Cookie is completely banned, which should be plenty of time for you to respond in time.

It is good or bad

Although this may be a bad advertising experience for you, it is definitely a good thing for us users to completely disable third-party cookies, because your information will not be easily tracked by others, and your private information will not be easily leaked.

But is it really that simple? Greedy advertisers will never give up tracking your information directly. First of all, they already have enough information about you, and third-party cookies are only one of many ways to obtain your information, but this method is more convenient and simple. For profits, they will find more alternatives:

Use one-party cookies instead of three-party cookies

If we introduce a third party SDK, such as Google Analytics, we trust it to collect and track our information within the permissible limits. So these SDKS can still use first-party cookies to complete user identity identifiers.

For example, gtag.js and analytics. Js set the following Cookie user identity information:


However, these cookies are not third-party cookies, but first-party cookies located in your domain, such as opening the Cookie information on Twitter:


We find that the cookies _ga and _GID are set under their own domain.

In the normal set-cookie format, Google Analytics cannot directly Set cookies to the twitter.com field. In addition, log collection requests initiated by Google Analytics cannot carry cookies in the Twitter.com domain.

Open SDK code I found that there is the use of JS to set the Cookie code:


Also, the request to collect the log does not carry any cookies. Instead, it carries this information in the parameters:


This approach mimics the process of using a three-party Cookie to identify user information and is a complete substitute for it. All in all, disabling third-party cookies doesn’t have much of an impact on this third-party SDK, just a slight change in thinking.

Of course, since Safari and Firefox have completely disabled three-party cookies, some AD marketing services are offering alternatives to using one Cookie, like Facebook Pixel:


You allow it to read one party’s Cookie, which means it can access more of your data, which means you’re at greater risk of having your user’s information leaked. Using a one-party Cookie is also not as flexible as using a three-party Cookie, and can be very limited in some scenarios.

Browser fingerprint

The main purpose of a tripartite Cookie is to identify you so that you know who you are the next time you visit, so there is no need to store cookies if there is a technology that can directly capture your unique identity. This technology is “browser fingerprint”.


A browser fingerprint is a way of tracking a Web browser through the configuration and Settings information that is visible to a Web site. A browser fingerprint is like a fingerprint on our hands, each of us has a nearly unique configuration.

If you take out a single configuration, there may be many people who have the same configuration as you, such as the following:


  • System Version:
    • My version of the system isMac OS X 10_14_6
    • about11.91%People with the same configuration as me
    • About every8One of the individuals has the same configuration as me
  • ChromeVersion:
    • The browser I use isChromeAnd the version is:81.0.4044.92
    • about0.08%People with the same configuration as me
    • About every1250One of the individuals has the same configuration as me
  • UTC+8Time:
    • myUTC+8Time is a2020.4.15 23:00:00
    • about2.30%People with the same configuration as me
    • About every43One of the individuals has the same configuration as me

If you look at each configuration individually, none of them are unique to you, but collectively? For example, if you look at these three things, the probability of someone having the same configuration as you is going to be significantly reduced. These are just a few simple features, such as system version and browser version, that can be obtained with a simple navigator.userAgent property.

There are many more such attributes, and they can come from HTTP headers, Javascript Attributes, browser plug-ins, and so on

HTTP Header


The HTTP Header above contains a lot of customization features, and you can see that the probability of being the same as me in each configuration is very low. However, this information belongs to the normal browser fingerprint, which can be interpreted as the parts that are easy to find and change, and you can easily tamper with them. Some configurations such as User-Agent and language that use JavaScript for navigator object fetching are most accurate and will not be tampered with. Here are some other common JavaScript attributes:

Javascript attributes


It contains some configurations that are easy to get with Javascript:

  • Screen width: Screen width
  • Screen height: Screen height
  • Cookies enabled: Allowed or notCookie
  • Content language: Language information
  • List of fonts: Font information
  • Timezone: Time zone information
  • The Navigator properties: the NavigatorObject containing property information
  • .

The above information is very easy to obtain, and with less information, the resulting fingerprint may have a greater probability of collision. In fact, JS can obtain far more than this, and there are some indicators with very low repetition rate below:

Canvas prints

Canvas is used in HTML5 to draw 2D graphic elements on web pages. Browser when drawing graphics, graphic interface will invoke the operating system, even if the same element with Cavans drawing, but because of the difference of system, different browsers use different graphics engine, different picture export option, the default compression level, against the sawtooth, sub-pixel rendering algorithm is also different.


The specific acquisition process is as follows: Render some text on the canvas, convert it with toDataURL, and you will get your Cavans fingerprint:

    const canvas = document.getElementById("canvas-fingerprint");

    const context = canvas.getContext("2d");

    context.font = "18pt Arial";

    context.textBaseline = "top";

    context.fillText("canvas-fingerprint-test".2.2);

    return canvas.toDataURL("image/jpeg");

Copy the code

As can be seen from the above figure, the probability of Canvas fingerprint and me being the same is <0.01%, which is a very important indicator in browser fingerprint.

WebGL

WebGL is a JavaScript browser API for rendering 3D images on web pages. Websites can use WebGL to fingerprint your device:


  • WebGLReport – completeWebGLThe browser report table is accessible and testable. In some cases, it is converted to a hash value for faster analysis.
  • WebGLImages – Render and convert hidden 3D images to hashes. Because the end result depends on the hardware device doing the calculation, this method generates unique values for different combinations of devices and their drivers. This approach generates unique values for different device combinations and drivers.

WebRTC


WebRTC (Web Real Time Communication) enables the browser to communicate audio and video in Real Time. It is usually used by Web applications that require a fast and direct connection. Even if you use a proxy, the site can obtain your real public and local IP addresses. The plugin can be used to leak your local IP address or track media devices. WebRTC will expose you:

  • Public IP address
  • Local IP address
  • The number of media devices and their hashes

CSS

Even if JavaScript is disabled, websites can use pure CSS to retrieve information like this:

@media(device-width: 1920px) { body { background: url("https://example.org/1920.png"); }}Copy the code

By counting the request logs for the 1920.png image, you can see which users have a window width of 1920px.

The calculation of UUID

By combining these metrics, you can calculate your own unique UUID, which is your “browser fingerprint”. Of course, the above indicators cannot be simply superimposed in calculation, because some indicators are highly aggregated in some scenarios, and each indicator brings different amount of information. Generally, each indicator has its own “information entropy” :

Entropy is the average amount of information contained in each received message. The higher the entropy, the more information can be transmitted, and the lower the entropy, the less information can be transmitted.

In the calculation of UUID, generally, the index with higher information entropy will have a larger weight, which can greatly reduce the collision rate and improve the accuracy of UUID.

Of course, these also need not yourself to each difficult to obtain, use clientjs (https://github.com/jackspirou/clientjs) can easily help you to get these indicators, and eventually get uuid:

// Create a new ClientJS object

const client = new ClientJS();



// Get the client's fingerprint id

const fingerprint = client.getFingerprint();



// Print the 32bit hash id to the console

console.log(fingerprint);

Copy the code

You can also get this information separately:

  const client = new ClientJS();

  client.getBrowserData();

  client.getFingerprint();

client.getCustomFingerprint(...) ;

  client.isCanvas();

  client.getCanvasPrint();

  client.getFlashVersion();

  client.isSilverlight();

  client.getSilverlightVersion();

  / /...

Copy the code

reference

  • https://zhuanlan.zhihu.com/p/34591096
  • https://mp.weixin.qq.com/s/5-oObFPiRP6a5O49YsS9wg
  • https://juejin.cn/post/6844903970180169742

summary

As an ordinary user, I will sigh, it is too difficult for me to protect my personal privacy, the platform to collect my information is everywhere, the means to collect my information is also a variety of…

In the real world, nothing stays the same.

As a developer, you need to be alert, aware of the crisis, and be the first to update your technology to respond to changes in the external environment, or you will be obsolete.

If there are any mistakes in this article, please correct them in the comments section. If this article has helped you, please like it and follow it.

Want to read more quality articles, can follow my Github blog, your star✨, like and follow is my continuous creation power!

I recommend you to follow my wechat public account [Code Secret Garden] and push high-quality articles every day. We can communicate and grow together.


We are the r&d team of ByteDance interactive Entertainment, including Douyin short video, Douyin Volcano, TikTok, Faceu, Dianyan, And Dianying. As of January 2020, Douyin has exceeded 400 million daily active users, and continues to maintain rapid growth. You’ll support product development and architecture, and every line of code can affect hundreds of millions of users.

Enrollment code of 2021: DRZUM5Z

Website delivery: job.toutiao.com/s/JR8SthH