preface
With the rapid growth of Companion fish business and users, the forms of operation activities of the company are becoming more and more diversified. The categories of activities can be summarized as follows (including but not limited to) : Group buying activities, leaderboard & vote solicitation activities, hongbao Rain activities, pre-sale seconds kill activities, lucky draw activities, etc., these operational activities play an important role in improving conversion, user retention and high user activity, so operational activities occupy a lot of RESEARCH and development resources of the growth department.
Requirements in the form of “User A has done W to give X rewards (or trigger some logic)” are common. In the initial undertaking process, each activity was independently developed, with a lot of repeated development, and even the repeated part still needed the intervention of the test students. All parties had a lot of resource input, and the overall process would be very long, usually it took about two weeks for an activity to develop joint adjustment and test. As a result, the ROI is very high when it comes to efficiently and reusable operations.
A preliminary study
In the preface, it is mentioned that there are often demands in the form of “user A completes W actions and gives X rewards” in the activities, as shown in the figure below. We define the requirement of meeting this grammar as A task:
From the figure, it can be seen that there are always similar parts, such as different rewards issued by users for recording picture books every day, different rewards issued by users for inviting purchase, and independent check-in of users. There are also many different parts, such as different rewards per session, different action cycles that can be completed, and possibly many upstream services. In order to quickly implement the configuration, we sorted out the goals of the initial mission system from these requirements:
- Don’t care about specific actions, simply distinguish actions by numbers. For example, 1 in activity A is designated to read picture books, and 2 in activity B is designated to invite purchase;
- Configure basic time rules.
- Store status and data related to tasks;
- Record the status of award issuance, but the content of the award shall be maintained by the business itself;
As shown in the figure, in the initial system, the business side needs to:
- Bind the relationship between numbers and actions;
- The business side obtains action indicators through consumption kafka queue or active query and then transmits them to the system.
- Award related logic;
However, after two sessions of “Picture Book 618”, we found that although this method used configuration, it failed to achieve the original goal for the following reasons:
- The binding relationship between actions and numbers has a high cost of understanding and configuration risk.
- The cost and risk of maintaining consumption queues and active tasks are high.
- Repeated task logic, the time cost of testing students did not reduce, compared with the beginning, external performance did not significantly reduce the development cost and testing cost.
Mission systems
After investigating the design of the 100-million-level QQ membership activity operation system and combining various task forms in the game, we sorted out the core function points that the Banyu task system needs to support after comparing with banyu operation activities:
- By the system to understand the task to achieve the action, reduce the understanding cost of business side;
- Time rule support, such as: can be completed twice a day, three times a week, the entire activity can only be completed once and similar rules;
- Users take the initiative to receive and the system automatically issued two forms of reward;
- Maintenance of mission basic state;
- The ability to support task resetting;
- Support the ability to configure display content;
- Relevant data records;
For the above requirements, we made the following abstractions:
Task state flow
The following states are defined for the task: Not started – > Not completed – > Waiting for collection – > Received – > Finished. The specific state flow is as follows:
Action check
The main function is to check whether the user in the uncompleted – > Unclaimed phase has completed the corresponding action. In order to simplify and facilitate understanding, we investigated and selected the rule verification package EXPR of the third party, sorted the complex actions of users into fixed KV pairs, and judged whether they reached the standard through expR package. For example, we define the action of purchasing picture book VIP as {buy_PB_VIP: true}, and the different types of VIP purchased by users are defined as {vip_type: forever}, {vip_type: year}, etc. If at this time there is a task compliance action: “user buys picture book VIP and buys permanent VIP”, then we can configure the compliance rule as follows: buy_PB_VIP == true && vip_type == “forever”.
Consume and clean the upstream queue
Each action has its own set of metrics. If the task system needs to understand specific metrics, a role must enter this information for the system. Here we consider two scenarios :(1) the producer of the metric builds the content and then notifies the system. (2) The system actively connects to each message queue for overall content cleaning and understanding.
Solution (1) has simpler system requirements, with business data built from above. However, this solution is serious for business code intrusion, and is not particularly smooth from the process of data production and use. Therefore, we finally choose scheme (2), which takes message queue as the input variable of the system and considers that every action notification is delivered to the system through message queue.
The overall processing process includes: receiving queue messages -> content cleaning -> system use. For example: picture book purchase VIP queue message. The message will contain the user’s purchase time, purchase VIP type and other information. We clean it up into a kv pair that the system understands, and then let the motion check part use it. As shown in the figure below, we have connected with the upstream queues of each business party. When there is a demand for new compliant actions, we only need to add the upstream queues connected and update some KV pairs of action verification. For the main process of the system, this update is not aware, and the intervention of the test students can be controlled in the connection of the new message queue.
Configuration items divide modules
We sorted out all the configuration items and divided them into six configuration groups as shown in the following figure according to different related functions: basic configuration, rule configuration, reward configuration, display configuration, inventory configuration, and routing configuration.
After the above abstraction, our mission system can provide the following capabilities:
- Business parties can connect gracefully, without the need for additional understanding and redundant development;
- New actions can be quickly supported without affecting the main system;
- The system gets through the basic activity configuration center, award center, flexible component service and other modular services of the activity marketing platform.
- Support one-click import and export of formal environment and test environment;
Further upgrade
In the process of undertaking activities, I gradually found new problems:
-
In the initial design, the system was expected to aggregate concepts and precipitate common metrics as check items. Such as:
-
Rule configuration: Task period (daily) + Number of tasks that can be repeated (1) + Manually completed by the front end (yes) = Manually completed once a day. Task period (weekly) + Number of tasks that can be repeated (10) + Number of tasks that can be manually completed by the front end (no) = Number of tasks that can be automatically completed 10 times a week
-
Inventory configuration: Users can complete (unlimited) + tasks can be completed (unlimited) = Unlimited inventory
-
Route configuration: Route redirect (reading picture book page) = Finish reading picture book task; Route Jump (PICTURE book VIP selling page) = Complete the picture book VIP selling task
It is hoped that users can be separated from the concept of specific indicators, reduce the cost of understanding, the corresponding database design is the same idea and direction. But in practice, the aggregation and precipitation of concepts do not reach the design goal. Generally speaking, on the one hand, the understanding of the concept is not so complicated, and the aggregation operation is optional in practical use. On the other hand, we can think of each task as independent, with no checkable content. The final implementation scheme leads to extremely complex configuration in actual use, which not only fails to achieve the desired goal, but also increases additional maintenance costs.
-
-
The granularity supported by the system is still insufficient. We encountered a new class of requirements, as shown below. There are two directions: For the task side, users have the opportunity to participate in this activity whenever they register, and the task should be valid for a long time; On the user side, the completion time interval of different tasks is different after registration, so how to distinguish the different time requirements of the two directions? This presents a new challenge to our system.
-
Insufficient configuration items are displayed. In addition to the basic display content, for special jump or task interaction, front-end students still need to develop this part of the code independently, which does not save the actual development of front-end students. The configuration content of a button is not enough to define a variety of front-end actions.
-
The response time and stability of all levels of the system need to be optimized. For example, messages accumulate in the queue after cleaning. The delay of obtaining the task list affects user experience
-
The boundaries of reward configuration are blurred. In addition to the basic reward issuance, it also supports the callback logic for the business side, and the system cannot be separated from the development tasks along with the business. The boundaries of this part of the system are not clear enough.
In view of the above questions, we have carried out the development of the new version and provided the following answers respectively:
Module repartitioning
As shown in the figure below, all configuration modules are cascaded through the unique identifier of the tasKID task.
- Basic configuration answers the basic definition of a task;
- The next hop (reward) configuration answers the question of what the user does after reaching the standard, and gives a clear boundary;
- Button configuration is used to deliver configuration to the front end, combined with Poseidon or marketing campaign SDK, to achieve rapid development;
- The display configuration answers the overall logic and content of the taskbar display;
Define user task box/user timed task
User task box: We can understand that each user has A logical structure similar to inbox under each activity. When we want the user to complete task A, we will release this task in the inbox corresponding to the activity. At this time, only the user can see and complete task A. In particular, we have defined its own time limit interval for the delivered record, so far we have completely solved the problem of insufficient granularity in the previous paper.
In order to make a clear boundary with the previous common tasks, we define a special task type as “deliverable task”, which means that the task can only be sent to users for completion on certain nodes. The simple process is shown in the following figure. The service side delivers a time-limited task to the user and defines the time range for the user to complete the task. The system maintains the internal task logic. For the business side, a similar requirement only needs to be concerned with three parts: when a timed task is sent to the user; How long is the user’s limited time interval; Where to get the list of time-limited tasks, answer these three questions, and access the task system capabilities painlessly and quickly.
Response optimization
- Add multi-level cache, Redis cache and memory cache;
- Optimize the cleaning queue;
- Concurrent output;
System Architecture Design
The overall system is divided into the following three levels:
The business layer
The business layer provides system output capabilities. This includes the ability to output to internal services and clients
System layer
- Business processing: encapsulating business data for output; Awards for meeting standards; Other state changes are output to the message queue;
- Action compliance processing: after action rule verification, judge whether it is valid and legitimate action received by the system according to boundary conditions such as current inventory and cycle;
- Internal data maintenance: maintenance of configuration data, user indicators, task status change history, all levels of cache processing and maintenance;
Base layer
Some common third-party services and capabilities that the system relies on include message queues, prize centers, basic activity services, common component services, Apollo dynamic configuration, etc
With this upgrade, the concept of a mission system has been enriched and expanded, where the “mission”, by its very nature, has become a trigger. So this version is called the Trigger version.
Comparison of R&D efficiency
- No mission system: single activity development + joint research > 7 days;
- Test version of mission system: Single activity development + joint research = 6 days;
- Task system V1: Configuration + New Action for Interconnection = 2 to 3 days;
- Task system Trigger version: Configuration + New action for docking = 1~2 days;
As you can see from the above, the amount of development for operational activities is greatly reduced as the mission system is iterated.
The follow-up work
- The KV pair of cleaning queue can be directly connected with the definition and value of common user attributes in the system in the later stage — business indicators and attribute indicators in the user portrait system;
- There is still a more elegant implementation of the process of message queuing being cleaned and then consumed;
- Support the development of Poseidon front-end page configuration;
- Action trigger based requirements and systems can refer to this process;
reference
- Hundred million QQ member activities operating system design
- Expr rule verification