Author: Yi chu, far leisurely
Background:
Xianyu is the largest second-hand trading platform in China. Sellers assume the important task of commodity supply and have a greater impact on the PLATFORM DAU. Therefore, we will explore the impact and help of different strategies on the growth of sellers. But then, how to scientifically evaluate these strategies with the AB experiment was another challenge. In this paper, a design of “full flow AB experiment” is proposed to complete the evaluation task that traditional AB experiment is not qualified for under the constraint scenario of “the experiment object is on the supply side and the inventory is shallow”.
one Business background and traditional AB solutions
Second-hand idle transaction is a special scene in the e-commerce market. The sellers do not introduce through investment attraction (compared with the types of merchants on Tmall platform, there will be a more intuitive comparison), but mainly ordinary users who sell personal idle. The transaction and retention of these sellers is worth more attention than that of buyers, for the following reasons:
1. Sellers have more pressing needs
Buyers have a lot of options to buy, not second-hand or new; But the seller can not sell successfully, is his a burden and a waste of resources.
2. Sellers determine the platform supply
If a seller fails to sell, it will directly affect the enthusiasm of the next product release and lose the potential supply. In addition, many categories of Xianyu platform show a state of supply less than demand.
3. The retained gain of sellers is higher than that of buyers
Accordingly, after the seller completes the transaction and recognizes that Xianyu platform can really help general sellers to publish and sell idle commodities, users can more easily form dependence and stickiness on the platform. Next, how to AB experiment the seller strategy is a new challenge. Why? Let’s review the traditional AB design.
AB experiment is a kind of strategy evaluation method of mainstream Internet factories. For example, if a recommendation system wants to iterate the CTR prediction model once, it can randomly group the traffic, use different CTR prediction models for different groups, and then observe user logs for data recovery. This completes a randomized controlled trials (RCT, RandomizedControlledTrial). According to the actual data, the objective and quantitative impact of the strategy effect can be obtained to provide accurate guidance for product iteration. So how do you do AB experiments? At this time, multiple requests from the same user will fall into different experimental groups. Each experimental bucket can be tested with different parameter models to estimate CTR. Since the traffic is randomly divided, each experiment is statistically independent, and the model with the best effect can be selected. In the e-commerce scene, the focus is not only on the pure traffic indicators such as CTR, but also on the human dimension of UV indicators, such as the per capita order number of buyers and UV retention, so the division of experimental traffic will be further, from “request ID” to “buyer ID” as the dimension. The same is true for the idle fish search scenario, as shown in Figure 1.
FIG. 1. General ab experiment block diagram
Under such a design as Figure 1, experimental controls on buyers can be satisfied, such as iterating a version of the interactive UI, or a version of the search relevance model. But now that the experiment is on the seller, can we simply switch roles? The answer is no, because we can never be like buyer partition, the same seller can only appear in a certain traffic bucket, which is equivalent to search candidate pool narrowed to 1/N(N is the number of divided buckets), the transaction is bound to fall seriously. There are some other compromise designs that are not perfect, as listed below. One is the classification of traffic based on categories. That is, according to the query belongs to different classes, to distinguish between the control group and the experimental group. Its deficiency lies in the supply and demand relationship and transaction efficiency of different classes of natural differences, it is difficult to achieve the requirements of random grouping. Another way is to do the experiment on the next day, that is, use the control policy for T day traffic, use the experimental policy for T+1 day traffic, and use the control policy again for T+2 day traffic. In this way, it is equivalent to taking time as the basis for division. However, this design has two disadvantages :1) It is too idealistic and ignores that different time itself may be an influencing factor, such as weekends, holidays, app updates, etc., which will affect user performance; 2) When the metrics themselves involve time, such as long-term retention, they require consistent strategies for the same user at different times. Before proposing the design idea of full flow AB experiment, we first list the conditions that a convincing AB experiment needs to satisfy.
two Persuasive AB experiment should follow the criteria
1. Policies for the same commodity/seller cannot be different due to different traffic buckets
In the traditional AB experiment, the same buyer can only fall into a certain bucket, which is to ensure the fixation of the strategy. Otherwise, the superposition of multiple experimental strategies will affect the accurate attribution. Here, too, the same seller cannot be affected by multiple distribution strategies simultaneously, breaking the single variable principle of the AB experiment.
2. The supply quota should be consistent with the traffic allocation; otherwise, it is easy to see that the push full revenue is lower than the revenue of AB, and the reference value of AB is discounted
Assume A experimental strategies than the control group had improved, and the strategy for some undervalued goods collection (level 10000) forecast A much more accurate, clinch A deal got under single barrel AB + 5% of the profits, but due to set A small, idle, and idle fish for sale is A man of shallow stock characteristics significantly, the flow of A bucket was enough to consume them, after the experiment to two barrels, Due to the competition, the profit will drop to +2.5%, and the profit after the whole push is even less significant, so the reference significance of AB experiment “objective and consistent profit after the whole push” is lost.
3. The intervention strategy should not harm the control group.
This makes intuitive sense. For example, when the target goods of the experimental group are weighted, the goods of the control group should not be pushed back by this effect.
3. Design of full flow AB experiment
Echoing the above three criteria, we divided the traffic of the seller ID on the search results page, as shown in Figure 2.
Figure 2. Design idea of full flow AB experiment
1. Full traffic bucket AB. It’s the same code in all the buyer buckets. This code comes with if-else logic for seller groups.
Conventional AB: traffic division of request granularity from different buyers between experimental group and control group; Full traffic AB experiment: The experimental group and control group are divided into pit granularity traffic from different sellers on the search result page. Because the seller experimental group strategy will appear in all traffic buckets, it is named full traffic AB experiment.
2. Both the supply and traffic packets are set to 50%
Offline hash partition of seller ID, such as bisect. Because it is randomly divided, the traffic of sellers in the experimental group should also be 50% in the search result page, which satisfies the requirement of consistent supply and distribution quota.
3. The intervention of sequencing strategy was only carried out in the flow of the experimental group, and the control group was not disturbed.
Because the experimental group and the control group are in the same search results page, extra attention should be paid to the weighted target goods.
Iv. Application case: Loss of sellers tilt
4.1 Task Background
Table 1. Retention gains are significant when the turnover seller and the new seller are stimulated by the chat (the buyer clicks “I want” to initiate an enquiry)
So from retained value consideration, for the day visiting and not subjected to seller, I want to inspire new/erosion experiment of traffic tilt, why not just choose to clinch a deal, because “enquiry” commodity exposure – > conversion to clinch a deal “commodity exposure – >” the conversion rate of 10 times, means that sellers can save traffic support more target. ####4.2 Full flow AB operation in addition to the reasons in chapter two, the total number of new sellers visited on that day is only 10,000. If you do not do full flow AB, the UV will be too little, and the fluctuation will be a big interference. As shown in figure 3. At this time, what we need to compare is the improvement of chat UV, transaction UV and retention effect of A1 seller compared to B1 seller.
Figure 3. Full traffic seller AB in the lost seller tilt experiment case
4.3 Experimental Results
4.3.1 Effectiveness of target seller support
Table 2. Full traffic AB experiment data for 5 days is shown below. The uv percentage of sellers receiving the chat incentive increased by +22%, in line with expectations.
4.3.2 Verify whether support does harm to the overall transaction efficiency
Now that it is supportive, it means that the weighted goods that have been at the bottom of the list are at the top. Does that really affect the overall index of buyers and sellers per person? The chart below shows an increase in the number of buyers per person in the experimental group. 5 day average per capita buyers and sellers increased +1% (see Table 3)
Further confirmation is needed. After checking the CTCVR of new sellers’ goods after support, the CTCVR of lost sellers’ goods is higher than the overall flow.
Five. Extended discussion
Q1: Full traffic AB is perfectly designed, but needs to incorporate seller experiment logic into every online code scheme. When the scenarios are scattered and the workload is large, what else can be done?
A1: You can also sacrifice some theoretical constraints. For example, in Chapter 1 of Chapter 2, “The same seller should not have different policies due to different traffic buckets”. The reference value of AB is slightly discounted at this point. Specifically, how does a single flow bucket seller AB do? We need to determine its traffic share, and make sure the sellers are grouped to match the supply side quota of the experimental group. During data recovery, the flow of both the control group and the experimental group came from a single flow bucket.
Q2: At this time, can the index of per capita buyers and sellers align with the index of buyers?
A2: can’t. The diameter is the volume of the transaction seller UV + transaction buyer UV/visiting buyer UV in the flow group. When full traffic AB is used, the denominator of visiting buyer UV stays the same, but the transaction UV drops to 1/2 as it splits the search results page in half. At this time, the seller experimental group versus the seller control group is still used for the comparison of per capita sellers, which has credibility.
Q3: Each buyer flow bucket has different experimental logic, will it interfere with the grouping experiment of sellers?
A3: If the existing experiment of each bucket does not contain relevant seller logic, as long as the logic is orthogonal, no interference can be guaranteed.