The problem background

Product side in order to more accurately identify user orientation after the release of functional requirements. Need to support A/B testing, set up experimental data, do later product analysis.

Practical purposes

The application cache layer at the C end distinguishes A/B access, reduces the return source, improves the user response speed, and relieves the source site pressure.

Technology research

Distributed node network distribution is multifaceted

AWS CloudFront improves the user experience by providing faster content delivery through accelerated distribution and zone edge caching.

Source site page content dynamic cache, with path as the core of the first latitude, c-terminal type as the second latitude. Form multi-page multi-terminal cache.

One of the objectives of A/B Testing does not affect user experience, so it requires quantitative analysis in the same latitude of the page, adding “experimental type” in the second latitude of the content cache, and multi-faceted caching for each page.

Example:

User-defined Experiment type header “Cloudfront-ab-experiment”

Applied edge computing

The primary problem of A/B Testing is how to distribute traffic? At which layer does traffic distribution occur?

(1) Set the proportion of traffic allocation. In A/B Testing group, each page is taken as the first latitude, and the type of end, user category mark and function block are taken as the second latitude. (After the allocation ratio is modified, the sampled data must be based on the data after the modification takes effect)

(2) If the allocated traffic is calculated at the source station, the statistical latitude of the data is the CDN back source traffic rather than the real C-end access traffic.

If all the traffic on the page is returned to the source but the user experience is strongly dependent on the edge cache, the solution is poor.

Therefore, a new technology, Labmda Edge, must be introduced to calculate and distribute traffic in CDN nodes, that is, taking account of user experience can also achieve the purpose of calculation.

Docs.aws.amazon.com/zh\_cn/Amaz…

User PV latitude A/B plane

Viewer request

Page View Each Page request determines the probability of accessing A or B in proportion.

User UV latitude A/B side

Viewer request + viewer response

The User View is the probability that when A new User visits the page, he or she will be labeled as A class A User or A class B User based on the allocation ratio. Type A and type B users access different contents. (User tags support specifying expiration time)

UV multi-page A/B side

Viewer + viewer response + path identifier

The probability of being labeled as class A user or Class B user is determined according to the allocation ratio. Labels are stored in latitude of the page path, and different pages do not affect each other. (User tags support specifying expiration time)

Steps to implement

Functional verification

Lambda@edge Application Process

CodeBase:console.aws.amazon.com/codesuite/c…

Creating an application

Functions are automatically created or updated after the code is submitted

Add trigger

Binding triggers automatically create versions after publication

Technology to the ground

Details page product A/B test release

Determine the first latitude, the second latitude, set the flow ratio.

First latitude content details page: /detail/*

Second latitude TYPE A\B users: type A old version page, type B new version page

Set traffic ratio: 75% for the old version page and 25% for the new page

Create a behavior

Set the traffic ratio and publish the application

Add the trigger and deploy

Lambda Edge computes logical decoupling of application A/B content assembly. (Deployment without dependencies)

The validation test

  1. Functional verification (Access data analysis)

A/B content verification.

Remove the mark and regain access to A/B content probability assignment.

  1. Cache validation (stress test)

For the pressure test scheme, the requirement is to enhance the page content caching capability through Lambda@edge, so the cache hit ratio and edge computing performance need to be investigated.

  • Application pressure measurement, client indicators + hardware resource monitoring

  • CDN pressure measurement, client indicators + hardware resource monitoring

Lambda high availability indicator system

  • Cache hit ratio

  • Length-height dynamic range index at response time P90, P99, P99.99 (response time is distributed according to the minimum)

  • Requests per second High dynamic range indicator P90, P99, P99.99 (requests per second according to the maximum distribution)

  • Dynamic range indicator of high downloads per second P90, P99, P99.99 (downloads per second by maximum distribution)

Lambda Performance Analysis Report:

  • RequestId: b37880ae-8356-452e-968c-a6a59c911e67

  • Duration: 19.49 ms

  • Billed Duration: 20 ms

  • Memory Size: 128 MB

  • Max Memory Used: 64 MB

  • Init Duration: 144.18 ms

It is recommended to use the new product CloudFront Function optimization in the future:

  1. The maximum excution time is less than 1ms, and the same code execution occupies 45% of the maximum allowed running time

  2. Closer to the user, code can be deployed at edge locations, where lambda is currently deployed at secondary POP points

  3. No cold start

Data validation

  1. Buried data analysis (whether A/B content is exposed or requested to be reported)
  • Fetch logic validation

  • Check buried data and set traffic allocation ratio

2. Cache hit data analysis

  • O&m provides data

Use advice

  1. You are advised to set the page-level user tag validity period to one day. If the validity period expires, the user will be assigned again. (Adjustable according to product requirements)

  2. It is not recommended to change the A/B scheme of the same page frequently, because the tags of the previous group need to be cleared. (Scale can be adjusted through incremental scheme)

  3. Plan A/B needs to determine the cycle, clear the current page user mark, and carry out the next group of experiments.

  4. It is not recommended to use the Ten function because creating Ten groups of types on a single page will weaken the caching effect.

  5. A/B group dynamic distribution of different modules on the same page, can be based on the desired data logic backward. For example, c-terminal dynamic loading is more accurate to collect exposure data.