The overview

In advertising, “personalization” and “privacy” seem to be at opposite ends of the scale: well-personalized ads usually collect a lot of user data and have a clear understanding of the user’s portrait; And if the user data are screened, personalized advertising is difficult to achieve results.

With the release of iOS 14 by Apple, privacy issues have attracted more and more attention, and privacy protection has been paid more and more attention. In this context, what we need to do is to “seize the momentum”, this article on the “post-privacy” era, personalized advertising should be discussed. From the technical point of view, the content includes:

  1. Analyze the relationship between personalized advertising and privacy (principle);
  2. The nature of privacy issues
  3. How to Protect Privacy
    1. Between the Web
    2. Between App
  4. In the case of privacy protection, what is the way out for personalized advertising?
    1. At present each big manufacturer’s train of thought
    2. Apple: SKAN/PCM
    3. Google: Privacy Sandbox (FLoC)
    4. Facebook: AEM
    5. First party landing page

The relationship between advertising and privacy

Advertising is not related to privacy at the beginning, but with the development of The Times, advertising platforms continue to put forward higher requirements for the efficiency of realization, the application of cloud computing, big data, machine learning and other cutting-edge technologies are more and more, and the scale of data collection, operation and application is gradually expanding. We divided the depth of advertising accuracy into several stages from shallow to deep:

(Credit: Everyone is a product manager)

Orientation through media

This is the stage of traditional media, through the difference of media attributes to targeted advertising, for example, the audience of sports channel is more sports fans, the audience of Beijing Xi Erqi subway station advertising is more Internet company practitioners, haidian Huangzhuang subway station is the audience of after-class parents.

Targeting by content

In the first wave of Internet around 2000, the earliest portal websites such as Sina and Sohu were launched, and the billing mode was mostly CPT or GD advertising. The accuracy of targeted content is higher than that of offline media. For example, viewers of fiction channel and variety channel are different.

Orienting by intention

With the launch of Google and Baidu search, advertising targeting based on search intent is much more accurate than content targeting, and the billing model is more CPC or CPM. Of course, these rely more on the collection of user information (search content, history, relevance), etc.

Orientation by user portrait

When the information flow products that rely on recommendation engines are now online, such as Ri Toutiao and Douyin, the content is largely determined by the preferences of users. Based on the user portraits formed by the data collected from various channels, these products push content and advertisements based on the user portraits. The charging mode is oCPX or CPA.

With the gradual improvement of the accuracy of the targeted advertising, the degree of user data collection, calculation and application is getting higher and higher. User privacy issues are increasingly coming into users’ attention.

What is the fundamental problem?

From a technical point of view, the fundamental problem is the User ID, or User ID.

Why is that? Because the User is behind all data and preferences, we need to “uniquely” define a User based on a User ID.

It sounds simple to define a user “uniquely”, but how do you define a user uniquely when there is no login?

The PC era

In the PC era, “uniquely defining a User” in the front-end sense is a very interesting topic, namely: how to generate a unique User ID?

If you know a front-end library like Fingerprintjs, you know how much user-dimension information, such as UserAgent, IP address, fonts installed, Canvas, plugins installed, and so on, is used to generate a very low collision rate User ID on the front end.

Juejin. Cn/post / 684490…

Mobile era

In the mobile age, in addition to the Web, we have more App forms. In the Web era, the front-end can get very limited information, while App can directly interact with the system or even the hardware, so much more information can be obtained. Android and iOS in the mobile era even provide unique device identifiers that don’t have to be generated themselves.

Android can use Android ID, iOS can upgrade from MAC to UDID and then IDFA, of course, with the gradual increase in coverage of iOS 14, the difficulty of obtaining IDFA is greatly increased (explained later).

(Photo by Zhihu)

What happens when you have a User ID?

With a unique User ID, you can do a lot of things:

Single subject

On the Web: In the same main site, users can be easily tracked continuously, even if they are not logged in. From a technical point of view, as long as the User ID is generated and stored in a cookie, the browser will automatically carry the cookie for all subsequent interactions with the backend. After the backend solves the User ID according to the cookie, Which pages the user visited and which elements he clicked on can be recorded. Even if the User clears the cookie and re-enters the site, the User ID is unique because the generated algorithm remains the same.

On the App: This is even simpler. From the moment the user opens the App, all the actions that the App wants to record can be recorded.

The multi-agent

Of course, if stop here, privacy problem is ok, after all, belong to the same subject, you go to use others’ services, do what things, preferences, others are naturally the most clear.

However, on the Web, if there is a third-party subject that puts its script into the domain of other subjects, uses the same User ID and saves it in cookies, then this third-party subject can obtain all the activity data of users under other subjects, which brings great challenges to the protection of User privacy.

On the App, because it is very easy to obtain a unified User ID, it can be tracked across the App, even more convenient than the Web.

Obviously, the highest risk is to share User ids across principals, especially if the User is not authorized or even aware of it.

How to protect user privacy?

Since the highest risk is for cross-principal User ID sharing, there are two main ways to prevent this:

  1. User ID: becomes harder to get;
  2. Sharing: Prohibit data sharing or matching across principals.

Following these two guidelines, let’s look at how to implement them in Web and App respectively.

Between the Web

  1. Make the user ID hard to get

Browser such as stated above, the User ID (i.e., fingerprints) generated depend on the browser or other equipment information can be collected, using are immutability of these information, and with the concept of privacy protection gradually thorough popular feeling, mainstream browser vendors will be “fingerprint protection” included in the browser within the scope of privacy protection.

For example, using Canvas to draw images and converting them to Base64 is easy to become an important information source for fingerprint generation due to the different hardware performance and screen display of each machine. A common countermeasure is to add some random noise data when drawing Canvas graphics, so that the generated information becomes not unique.

This technology has been pioneered in Safari and Firefox. For example, access the DEMO address of Fingerprint. Let’s compare Safari’s normal mode and private mode Fingerprint.

Safari

(Left: Normal mode, right: Private mode, Safari version 14.0.3)

As you can see, the left and right are different. In addition, during the test process, the generated fingerprint will change (possibly due to the internal mechanism of Safari), see Apple Safari 2019 Privacy Security White paper.

Firefox

(Left: Normal mode, right: Private mode, Firefox 87.0 (64-bit))

It can be seen that the left and right are the same, which is limited by Firefox’s Fingerprint detection mode (it is used to intercept the web page based on whether it communicates with the mainstream domain name providing the Fingerprint service, which is provided by Disconnect). Please refer to the release notes of Firefox 72 for details

Chrome

(Left: Normal mode, right: Private mode, Chrome version 89.0.4389.90 (official))

As you can see, the left and right are the same. There is not much public information about fingerprint protection in Chrome yet, but more about the capabilities provided by some Chrome add-ons.

  1. Disallow data sharing or matching across principals

For the Web, “prohibit cross-subject data sharing or matching” depends on the carrier is Cookie, to be precise, should be third-party Cookie. The basic principle is as follows:

The cookies with green background in the figure are third-party cookies, that is, cookies set by third-party scripts on the main website. These cookies can be controlled by the third party and are automatically carried when simple requests are sent to the third party. In real cases, third-party scripts are used to track users for a variety of purposes, up to a hundred on some sites.

Obviously, our protection is to prohibit the transmission of third-party cookies.

The risk of privacy protection caused by third-party cookies is basically recognized by everyone, which has been gradually realized in mainstream browsers to varying degrees. Minority browsers such as Tor browser and Brave browser have already realized this measure. The situation of mainstream browsers is as follows:

Safari

  • Released on March 24, 2020: After iOS/iPadOS 13.4 and Safari 13.1, the transfer of third-party cookies is blocked by default.

Firefox

  • Released in 2019.9.3: After Firefox 69.0, third-party Cookie transfer is blocked by default.

Chrome

Chrome in blocking third-party Cookie transmission action should be more cautious, in order to gradually achieve the goal of blocking third-party cookies:

  • 2016.5.25 Release: Starting with Chrome 51, Cookie adds a new attribute:samesite, the property has three values from low to high:None,Lax,Strict, the default value isNone, as shown in theMDN.
value
explain
None
Cookies will be sent in all contexts, that is, across domains.
Lax
Cookies are allowed to be sent with top-level navigation and will be sent with GET requests from third-party websites.
Strict
Cookies will only be sent in a first-party context and will not be sent with a request from a third-party website.

The following is an example of the impact on third-party urls:

Request type
The instance
before
None
Lax
Strict
link
<a href="..." ></a>
send
send
send
Don’t send
preload
<link rel="preload" href="..." />
send
send
send
Don’t send
GET the form
<form method="GET" action="..." >
send
send
send
Don’t send
POST form
<form method="POST" action="..." >
send
send
Don’t send
Don’t send
iframe
<iframe src="..." ></iframe>
send
send
Don’t send
Don’t send
Ajax
$.get('.. ')
send
send
Don’t send
Don’t send
Image
<img src="..." />
send
send
Don’t send
Don’t send
  • Released on January 14, 2020.The transmission of third-party cookies will be completely blocked by 2022.
  • 2020.2 Release: Starting with Chrome 80,samesiteThe default value ofNoneUpgrade toLaxAnd requires that if the Cookie is forcibly setsamesite=None, then you must set one at the same timeSecureProperty (that is, the Cookie must be transmitted over HTTPS);
  • 2020.4.3 Release: Changes in the previous column are rolled back due to COVID-19;
  • 2020.7.14 Release: Chrome 84 starts, relaunches rollback changes;
  • .
  • To be Continued: For a more detailed timeline of Samesite’s progress, check out the description on the official website.

Based on this timeline and expectation, samesite’s default will be upgraded to Strict again in the future, which is the ultimate goal (like Safari and Firefox). The expected time point is 2022, which will prevent third-party Cookie transmission for about 6 years.

Between App

  1. Make the User ID hard to get

iOS

Currently, the “most famous” policy that makes it difficult to obtain User ids is Apple’s iOS 14 new policy. Let’s take a look at the overall timeline:

  1. Before iOS 5, you can obtain the UDID
  2. September 2012: iOS 6 released with Identifier for Advertising (IDFA)
  3. 2013.5: Prohibit the application of UDID (compliance requirements);
  4. September 2013: iOS7 is released, it is forbidden to obtain MAC/UDID, IDFA can be reset (Settings -> privacy -> advertising -> restore advertising identifier), after reset to generate a new IDFA, the switch to restrict advertising tracking is off by default;

(Photo by Zhihu)

  1. 2016.9: iN iOS 10, “Restore AD Identifier” will reset IDFA to a meaningless string of zeros, and the switch to restrict AD tracking will remain off by default;
  2. 2020.6: In iOS 14, Apple announced that it will introduce a new privacy policy framework, App Tracking Transparency (ATT). Under the new policy, the switch to restrict AD Tracking is on by default. If an App needs to track users across apps, it will have to pop up an application window. And the IDFA can only be obtained after the user clicks “allow” (it can be guessed that the user who sees the popup window is most likely not allowed).

(Image: Apple developer website)

Due to the huge impact of this move on the entire mobile Internet advertising industry (which will be discussed below) and the impact of the global spread of COVID-19, ATT was not released with iOS 14 in 2020.9, but has been postponed, and finally on 2021.4.26. Apple released the feature with iOS 14.5.

Throughout the timeline, Apple has been more and more strict in its privacy protection policy since 2012, making it more and more difficult to obtain unique User ID, or even impossible to obtain it.

Android

Compared to iOS, Android’s restrictions on obtaining User ID identifiers are much milder. According to Bloomberg:

Google is studying and following ATT and exploring alternatives. The solution is likely to be less stringent and will not require pop-ups to get user permission. These explorations are in the early stages and there is no decision on whether or when to follow up.

Discussions around a similar solution for Android suggest that this alternative is likely to be restricted gradually in the same way that Chrome restricts third-party cookies.

To sum up, there is no clear timetable for User ID and related privacy protection policies on Android in the short term.

  1. Disallow data sharing or matching across principals

iOS

As mentioned in the first point, ATT policy not only restricts access to IDFA, but also prohibits data sharing and matching across apps.

Android

Again, there is no clear timetable for the policy.

Government agencies,

In addition to the major manufacturers provide some privacy protection measures, the government or institutions have also formulated and introduced some laws and regulations on privacy protection.

  1. General Data Protection Regulation (GDPR)

The main objectives of the GDPR are the control of individuals over their personal data and the simplification of uniform norms within the European Union for international commerce. The explanation that can be directly perceived by users is that the User ID mentioned above is generated, stored or even shared across subjects without the User’s knowledge or cannot be rejected even if the User knows it. GDPR wants to “take” that control back.

One obvious change is that after GDPR goes into effect, many websites will disclose their privacy policies and use of Cookie technology on their pages (as shown above), telling users:

  1. The purpose of personal data, how long it will be retained, and whether it will be shared with third parties
  2. How to Stop tracking
  3. How do I access, manage, update, and delete personal data

GDPR went into effect on May 25, 2018. See the Wikipedia entry for more details.

  1. California Consumer Privacy Act (CCPA)

The basic content and purpose of CCPA is basically the same as that of GDPR. Their differences lie in the definition of personal data, content scope and geographical scope of each kind of information, and the right to sell personal information.

CCPA will be implemented on January 1, 2020. See the Wikipedia entry for details.

Appendix: In the process of investigation and investigation, we found tools to improve the efficiency of compliance transformation: __CookieBot

What difficulties will personalized advertising face?

In the case of the overall tightening of privacy protection policies, it will have a general impact on transforming attribution, user portraits, programmed transactions and so on in the advertising industry. Here we discuss the most important one, namely, transforming attribution.

What is the transforming attribution?

In layman’s terms, “this conversion was caused by a show/click in which media.” Transforming attribution is very important for advertising systems because it is closely related to advertising billing and advertising targeting.

The figure above abstracts the transformation process of user 123 from principal A to principal B (of course, the actual process is much more complicated than the figure). The attribution service in the figure can be provided by principal A or A third party. The information reported by subject B can be realized by embedding SDK from Subject A or reported by subject B itself.

As you can see from the figure, the key point is that the attribution service needs to get A match from principal A and principal B based on the User ID. However, if the User privacy protection policy makes it difficult to obtain User IDS and forbids cross-principal data sharing or matching, the reporting link from the B principal must be disconnected, which is a disaster for advertising conversion attribution.

In addition, generally, after the attribution service is completed, we can enter the advertising bidding model according to the User ID’s interest characteristics for the advertisement, give feedback in real time, optimize the model to predict the accuracy of THE eCPM, and then conduct advertising redirection and push similar advertisements to similar users. These tasks, under the user privacy protection policy, can not be done.

What are the current solutions?

Privacy and personalization are at opposite ends of the scale, and the industry is trying to find a balance between delivering personalized ads to users without compromising their privacy. If implemented exactly as described above, privacy policies can be devastating to personalised advertising. Fortunately, when major manufacturers launch privacy protection policies, most of them also provide compliance solutions, and there are wrestling and games between manufacturers. Here are some common solutions for transforming attribution and model orientation, which can be divided into Web-to-Web, app-to-Web and app-to-app from scenarios:

  • Web-to-web: jumping from a Web page to a Web page (usually seen in PC advertising).

  • App-to-web: jump from App to webpage (this is a common third-party landing page advertisement, such as opening the advertiser’s landing page from Tiktok Feed stream);

  • App-to-app: the scene of jumping from App to App (this is a common application download advertisement, such as clicking on the App Store from tiktok Feed stream, and entering the destination App after downloading)

Private Click Measurement (PCM)

PCM was released by Apple on February 1, 2021, and will be integrated with iOS/iPadOS 14.5. It mainly solves the AD attribution problem in the Web-to-Web and app-to-Web scenarios in MacOS, iOS and iPadOS systems. The principle is as follows:

Web-to-Web

Before:

(Image redrawn from Webkit blog)

As the previous analysis, in the case of privacy protection, User ID is difficult to obtain, and cross-site data sharing is also impossible. Let’s look at how PCM solves the problem:

(Image redrawn from Webkit blog)

  1. in<a>The tag adds two attributes:attributionsourceidattributeon
    1. attributionsourceid: 8-bit source ID, namely 0-255 256 in total. In advertising, it usually refers to campaign_id, indicating which advertising project it belongs to.

Reading:

  1. Since it’s for attribution, why can’t it be called campaign_id?

A: Because PCM is not technically associated with advertising, it has a more neutral name, source_id.

  1. What is the purpose of limiting 8-bit = 256 totals?

A: To limit the maximum number of AD programs that can be tracked across sites under the same A subject, i.e. 256.

  1. attributionon: The target domain name of the next attribution, only registerable domain name or eTLD + 1 (commonly known as a legal registerable domain name) is supported.

Reading:

  1. What is attributionon?

A: To jump to the Domain name of the destination site, there is a noun called eTLD + 1, eTLD= effective Top Level Domain, unlike TLD, i.e..com,.cn, eTLD is effective TLD, Mozilla maintains a Public Suffix List (PSL), such as.com,.edu.cn.

ETLD + 1 is a more accurate way of saying it. Like my color picker – making project home page address: zhangbobell. Making. IO/color – picke… Lot. IO is a eTLD, so if you jump the purpose of the web page is the address above, attributionon = “` ` https://zhangbobell.github.io ` `”.

There are also some interesting ETLDs like s3.eu-west-2.amazonaws.com, pvt.k12.ma.us which can be eTLD + 1 if you have a look at the PSL website to see “How the browser address bar tells you you are searching, Or entering a website?” This judgment is based on the maintenance list, and other applications can be found on the website.

  1. Why restrict to eTLD + 1 instead of the host part of the generic URL?

A: In general, eTLD + 1 has a specific domain name registration body, and the host part may be generalized. If *.shop.example points to shop.example, Then attributionon = “` ` https:// ` ` janeDoeTracking. Shop. The example”, obviously, the janeDoeTracking can be mapped to a user ID, a loophole, cannot achieve the purpose of protecting privacy.

  1. If the user clicks on the A tag, they end up at shop.example. So this time,attributionsourceidThe data will be stored as a click from social.example to shop.example, stored locally in the browser for 7 days. No website can read the data, it is stored in the browser.
  2. In shop.example, click the “Add to Cart” button as a conversion event, which will trigger the event to report to social. Up to this point, it’s no different than a traditional Pixel. Unlike Pixel, however:

A GET request is reportedhttps://social.example/.well-known/private-click-measurement/trigger-attribution/``[4-bit trigger data]/[optional 6-bit priority], as shown below:

(Image redrawn from Webkit blog)

There are two parameters in the URL trigger Data and optional priority:

  1. trigger data: 4-bit (00 to 15, must be a two-digit number), which indicates the event types, such as add to favorites, add to shopping cart, place an order, complete the payment, etc.
  2. optional priority: 6-bit (00 to 63 must be a two-digit number), which indicates the event priority.trigger dataIn the business, events such as payment completed > order placed > add to shopping cart may have different priorities. We give these transformation events different priorities to control the final delivery and will not be included in the final attribution report (described below).

Once the browser detects a conversion event that matches the previous click, the browser sends attribution results at random times over the next 24-48 hours, reporting the higher-priority conversion event if a higher-priority conversion event enters.

Reading:

  1. How is this step different from traditional Pixel?

A: First of all, the message sent back is only 4-bit, which limits the possibility of sending back information such as User ID/Request ID, i.e., up to 16 conversion events. Pixel reports do not have this limitation at all, and can take User ID and convert all attribution-related information of the event.

Secondly, due to the existence of “priority”, only one transformation event will appear in an attribution result, while Pixel can report all transformation events that have occurred, that is, PCM attribution has aggregation.

Finally, the browser stores the transformation event and sends it out at random moments within 24-48 hours, while Pixel is real-time, meaning PCM attribution is deferred.

  1. And then there’s the attribution result

(Image redrawn from Webkit blog)

The ATTRIBUTION result of the PCM will be requested via HTTP POST to /. Well-known /private-click-measurement/ report-Attribution /, This case is https://social.example/.well-known/private-click-measurement/report-attribution/, request body content is as follows:

{ "source_engagement_type": "click", "source_site": "social.example", "source_id": [8-bit source ID], "attributed_on_site": "shop.example", "attribution_trigger_data": [4-bit trigger data], "version": 1}Copy the code

Notice that, as mentioned above, the priority information is not included in the final result. What needs to be explained is:

  • source_engagement_type: For PCM, the value is always “click”.
  • version: The current value is 1. There may be room for future upgrade.

App-to-Web

After understanding the web-to-Web, app-to-Web is very simple to understand, except that the source of the click is changed from Site to App. The process is the same, but in the specific App implementation, the part that was previously implemented through the A tag, Open Safari (note that PCM does not currently support inline WKWebview or SFSafariViewController (SVC), so Safari must be invoked. The experience is relatively poor, but Apple has expressed interest in supporting SVC).

(Image redrawn from Webkit blog)

The UI differences between WKWebview and SVC are as follows:

(Left: WKWebview for Facebook App, right: SVC for Youtube App)

WKWebview provides a lot of APIS, with strong customization, such as nine-split screen, information flow cards, preloading, etc. SVC, on the other hand, offers a very limited API and is essentially uncustomizable, essentially a built-in version of Safari.

It’s worth noting that Apple’s “Privacy Based Ad Click Attribution” initiative (also known as Ad Click Attribution), which has been incorporated on Mac or iOS since May 2019, is the precursor to PCM, The basic principle is the same as PCM, with some parameter names changed slightly.

Left: MacOS: Safari -> Development -> Experimental Features -> Ad Click Attribution

Right: iOS/iPadOS: Settings -> Safari -> Advanced -> Experimental Features -> Ad Click Attribution

To try this out, download the Safari developer or iOS/iPadOS 14.5 beta, or try the previous Ad Click Attribution version, which had a debug feature that issued Attribution results 10 seconds later. Not 24 to 48 hours.

PCM is currently in the Draft stage of the W3C Privacy Community group, detailed in the W3C Spec, and requires two browsers (currently only Safari) to implement the standard independently before it can become a Web standard. For more details on PCM, See the Webkit blog post.

SKAdNetwork (SKAN)

Ad Network API is an App advertising attribution scheme proposed by Apple. The main purpose is to solve the attribution problem of App download/install ads (app-to-app) in the case of privacy protection. It mainly includes three participants: advertising platform, source App and promoted App.

  • Advertising platform (such as TT4B) : Register the ID of the advertising platform with Apple, deliver the advertisement to the source App, receive and verify the callback of the installation event;
  • Source APP (such as Douyin, A APP in the picture) : display ads delivered by the platform;
  • Promoted App (such as a new game, App B in the picture) : When the App is opened, the sending method of installation event is called (the actual delay is 0-24 hours).

Details are as follows:

(Image from Apple developer website)

After understanding PCM, SKAN is well aware that all the features are similar (constraint scope, event aggregation, delayed delivery), let’s take a look at the data that was reported late at the end

{"version": "2.2", "ad-network-id": "com.example", "campaign-id": [integer between 1-100], "transaction-id": "6aafb7a5-0170-41b5-bbe4-fe71dedf1e28", "app-id" : 525463029, "attribution-signature": "MEYCIQDTuQ1Z4Tpy9D3aEKbxLl5J5iKiTumcqZikuY/AOD2U7QIhAJAaiAv89AoquHXJffcieEQXdWHpcV8ZgbKN0EwV9/sY", "redownload": true, "source-app-id": 1234567891, "fidelity-type": 1, "conversion-value": [6-bit conversion value] }Copy the code

Let’s discuss the details:

  1. campaign-id: Because SKAN is designed for App Ad, the name is not as common as PCM, which is limited to integers between 1-100;
  2. conversion-value: Conversion event value, the 6-bit integer 0-63, is used to convert event types, such as user registration, login, etc., similar to PCMtrigger dataProperties. The App can update this value by calling a function after the first opening and before the 0-24 hour window period, after which the window period restarts.

For example, app-id refers to the promoted APP B, source-app-id refers to the source APP A, ad-network-id refers to the ID of the advertising platform. The attribution-signature is intended to validate the attribution, which I won’t discuss further here, but you can check out the Documentation for SKAdNetwork on the Apple developer website.

SKAN was also the source of the much-discussed “iOS 14 issues”. The first version was released in 2018, but at that time it was still available to IDFA, so it didn’t attract much attention. In 2020, with THE inclusion of IDFA into ATT, SKAN, as the only officially compliant App advertising attribution scheme provided by Apple, also released the second version, which is integrated in iOS 14.5.

Throughout the programs provided by Apple, both PCM and SKAN reflect the general principle of “how to complete advertising attribution under the condition of privacy protection” :

  1. Restricted scope: No matter campaign_id or triger-data, all data used for attribution are limited by corresponding bit numbers, which largely avoids accurate information disclosure.
  2. Aggregated events: All subsequent transformation events generated by a click are Aggregated according to specific rules (PCM by Priority, and SKAN by timing), and then only one transformation event is reported.
  3. Delayed delivery: It will not be sent immediately after the transition event has occurred but will be Delayed with a random delay of 24 to 48 hours.

Aggregated Event Measurement (AEM)

Since the APP-to-Web of PCM scheme only supports the external hop up of Safari browser at present, which leads to poor user experience, Facebook proposed AEM scheme based on the general principle of PCM design. Since the principle is the same, I will not analyze the process in detail, but just analyze the differences in details:

  1. trigger-data: Transformation event type, PCM is 4-bit (16 kinds), AEM is 3-bit (8 kinds);
  2. Delayed reporting: PCM is 24-48 hours, AEM is 72 hours (3 days);

Of course, how to store locally, priority matching and delay reporting all depend on the implementation of Facebook App itself. Through such “self-restraint” behavior, AEM makes app-to-Web happen within the App in the whole link, providing better user experience. Check out Facebook’s introduction to AEM for details.

The Facebook AD platform has launched AEM solutions and is available to advertisers. On the other hand, whether the AEM program has been approved by Apple is still in progress. At present, everything is not clear, and any subsequent progress will be synchronized at any time.

Federated Learning of Cohorts (FLoC)

Federal Learning provides an interest-based advertising selection mechanism under the protection of privacy. It was first proposed by Google in 2016 and originally used to solve the problem of android terminal users updating their models locally. It has been widely applied in China in the past two or three years. InfoQ also published an article called “ByteDance Breaks Federated Learning: Open Source Fedlearner Framework increases AD Delivery by 209%” that describes some of ByteDance’s results in federated learning.

Strictly speaking, FLoC is for advertising targeting. FLoC will select users with similar browsing histories to a Cohort, and the AD platform will provide appropriate ads for these groups through federated computing. ** Although there is information exchange in federated learning, the existence of federated learning framework ensures that information exchange is anonymous and desensitized without reverse cracking or exhaustive, ensuring personal privacy and legal compliance. According to Google’s own assessment, FLoC is 95 percent as effective as using third-party cookies.

Cohort calculation and formation are completed locally in the browser, data source is the user’s browsing history, local computing page classification, etc., in order to protect privacy, may also add some noise confusion when the results are output. Here is an example:

(Image from FLoC website article)

Examples are as follows (from the official website) :

  1. The FLoC Service provides a model of thousands of Cohorts, each Cohort can be treated as a browser with similar recent browsing history;
  2. Through FLoC Service, A’s browser can obtain FLoC model data, based on A’s browsing history, browser local calculation, A belongs to Cohort 1354, in the same way, although B’s browsing history and A are not quite the same, But they may still be at Cohort 1354;
  3. B then visits an e-commerce website that sells shoes:
    1. The website asked B’s browser for his Cohort: 1354, and then B looked at hiking boots, noting that “a browser from Cohort 1354 might be interested in hiking boots”;
    2. The site regularly aggregates data and shares information about Cohort and product interests with advertising platforms such as TT4B.
  4. Then, A visits A news website:
    1. The website had asked A’s browser for his Cohort: 1354;
    2. The site has requested ads from the advertising platform TT4B, which includes Cohort 1354 of A
  5. Advertising platform TT4B provides relevant advertisements for A, mainly from two aspects of data:
    1. Cohort information on A provided by news websites;
    2. “Browser from Cohort 1354 May be interested in hiking boots” from e-commerce website
  6. News sites display ads: 🥾

The above A and B are for the purpose of expression. In practice, personal information will not be disclosed to any relevant party. 5. It can be considered that the Cohort is not a group of people, but a collection of browsing history.

The FLoC proposal proposes a new JavaScript API:

cohort = await document.interestCohort();
url = new URL("https://ads.example/getCreative");
url.searchParams.append("cohort", cohort);
creative = await fetch(url);
Copy the code

By default, all historical records of websites will be treated as a source of Cohort calculation, if you want to Opt out, you can use HTTP reponse header to carry out:

Permissions-Policy: interest-cohort=()
Copy the code

The FLoC API is available on Chrome 89 and above, but to try it out, you need to open Chrome from the command line. See floc.glitch. me for the final DEMO, and more details here. Google will be testing FLoC on Google Ads in the second quarter of 2021 (FLoC also has some security concerns, though these tests have already begun but have been met with resistance).

Security risks include:

1. The risk of fingerprint identification remains: fingerprint + FLoC aggravates the risk.

2. Vulnerable groups are easy to be discriminated against: for example, advertisements recommending discriminatory job hunting, house purchase and credit based on gender, age, race, religion and other dimensions are all taking advantage of the label of vulnerable groups for personal gain.

3. Cross-context privacy exposure: FLoC will be updated and the site will be aware of the “migration” of users’ interests

In addition, FLoC “processed” user behavior data inside the browser without users’ knowledge and consent, which did not comply with THE GDPR principles, so Google selected 0.5 percent of users in SEA, JP, NA and other regions for the test.

It is important to note that FLoC is only part of the Privacy Sandbox proposed by Google, and since this point is interesting, it is shared separately. Google launched the Privacy Sandbox project in August 2019 with a mission: to build a thriving Web ecosystem that respects every user and Privacy by default. To discuss and implement “How to commercialize without privacy”, which is closely related to the topic of this article, the project consists of a series of proposals:

  1. Replace existing features that rely on cross-site tracking, such as anti-spam, AD conversion metrics (similar to PCM’s features, but different in detail), AD targeting (FLoC), etc.
  2. Gradually offline third-party Cookie: Cookie property change (shared above), first-party set, etc.;
  3. Common alternatives: browser fingerprinting, cache viewing, browse tracing, etc

These have important implications for the study of the balance between Privacy and advertising, and also reflect Google’s more systematic and long-term solution to the relationship between the two, which is highly recommended for you to read (Privacy Sandbox website).

First party landing page

Because of privacy protection direction is: banned across the main body of the user information matching, in addition to the above scheme, we can also limit all the user information in the same subject, the solution is to use the first party landing page: the user’s behavior as much as possible within a complete (closed loop), have transformed behavior is not a third party.

To that end, we launched Lead Generation, a product geared toward the clue-gathering scenario. Users submit clues on the first-party page to complete the conversion.

(Clue Collection DEMO)

conclusion

The relationship between privacy and personalized advertising is like two ends of the scale, which poses new challenges to our students engaged in the advertising industry. How to provide personalized advertising services for users under the condition of protecting privacy also reflects the respect and awe of technology for privacy. This article focuses on the history of personalized advertising, the nature of the problem, how to protect privacy and some existing solutions. Of course, with the passage of time, new solutions will constantly appear, I believe that one day in the future, we will be able to find a more appropriate solution between the two.

Finally, due to the lack of talent and learning, the content of this article will inevitably appear fallacy, but also ask you a lot of criticism!

Relevant knowledge and documentation

  1. Get to know the various device identifiers for iOS

1. The UDID is a Unique Device Identifier (UDID). It is a 40-character letter and number that Apple has banned from reading in order to protect user privacy.

2. UUID, short for Universally Unique IDentifier, is based on a single application on an iOS device that remains the same throughout the user’s use of the application, as long as the user has not completely deleted the application. If the user deleted the application and then re-installed it, the UUID has changed. The downside of UUID is that after the user deletes your application, it is basically impossible to retrieve the previous data.

3. The MAC address defines the location of the network device. A host computer has a MAC address. MAC addresses are fixed and determined by the network card. To protect user privacy, Apple has prohibited reading the MAC address.

4. OpenUDID, not officially issued by Apple, is a third-generation solution to replace UDID. The disadvantage is that if you completely delete all apps with OpenUDID SDK package (such as recovering the system, etc.), OpenUDID will be generated again and the value will be different from the previous one, which is equivalent to a new device.

5, IDFA advertising identifier, suitable for external: such as advertising promotion, exchange and other cross-application user tracking, etc.

6, IDFV, IDFV-identifier For Vendor, applications from the same developer (such as com.zhihu. App1 and com.zhihu. App2) run on the same device, the value of this attribute is the same; Different carrier applications running on the same device have different values.

  1. Webkit Prevents Tracing (ITP) introduction

Webkit.org/tracking-pr…

  1. Nuggets on the device ID generated articles

Juejin. Cn/post / 684490…

  1. Apple’s Safari white Paper from 2019

www.apple.com/safari/docs…

  1. Research paper on browser fingerprint generation

Arxiv.org/pdf/1905.01…

  1. Summary of Ali’s modification of Cookie Samesite default to Lax:

Github.com/mqyqingfeng…

  1. Privacy Sandbox introduced by the Chromium team

www.chromium.org/Home/chromi…

  1. Introduction to federated Learning on CSDN

Blog.csdn.net/cao81275515…

  1. FLoC testing was resisted

Mp.weixin.qq.com/s/XNgyjrBnF…

“Job Information”

We are bytedance’s commercial technology team in non-China region, mainly committed to continuously improving the experience of global users on the landing page of advertisements, optimizing the conversion effect of advertisements, and building the first-party landing page ecology. We are a global team, Base in Beijing, Shanghai, Hangzhou, Mountain View, Seattle and other places, recruit senior/senior front-end & back-end engineers and front-end & back-end interns for a long time. Welcome to join us!

Welcome to “Byte front-end ByteFE” resume delivery email “[email protected]