Author: Wang Wenhua (Lian Mo)
Qianniu is a multi-terminal open working platform for Alibaba merchants, serving millions of active merchants to operate business on mobile and desktop every day, including store management, customer service reception, information and other functions.
Meanwhile, Qianniu itself is an open end architecture, and two or three parties can provide services for merchants through the open system (we call it plug-in system). The reason why it is called plug-in is that we have defined several open nodes and standards in the business link, which are implemented by the business side according to the standards and complete the corresponding functions. Because of the existence of these standards and specifications, business links can be connected between different plug-ins, so as to avoid the problem of function closed loop being broken caused by merchants choosing different plug-ins.
Here are the open nodes defined by the Thousand Ox:
Openness promotes the entry of business and third-party ISVS, enabling Qianniu to make full use of external resources and services, such as accelerating the development progress and meeting the customization needs of merchants. A tripartite plug-in needs to go through 4 stages: ISV development, service market launch, merchant purchase, and use on Qianniu. Qianniu also has a set of rules to guide users to preferentially try the free version of the plug-in if the merchant does not choose the default management plug-in, and ISV can guide and upgrade the subscription during the trial process of the merchant to make profits. But openness also brings its own problems — business experience problems.
In order to improve the merchant experience, we launched the Open Experience upgrade program. After continuous governance, the number of thousands of cattle monthly open public opinion has been reduced by 50%. So, what is the public opinion of thousand Cattle, and how is the overall plan of prevention and control designed?
Problems and causes
Thousand cattle open public opinion characteristics
Due to the characteristics of openness, public opinion about qianniu openness is scattered and the causes are complex. There are a large number of tools on Qianniu, which are provided by two or three teams. Some of the two tools have a long history and lack of investment in maintenance, while the technical capabilities of ISVs are uneven. There are many open technology stacks, including H5 in the early stage, QAP (weeX packaged open framework) in the middle stage, and small programs. The plug-in startup link is long and involves more than seven technical links from the front end to the ISV server, which is susceptible to network and service jitter. Numerous unstable factors bring challenges to open public opinion governance.
What is at the heart of the open experience?
Open experience problems are diverse, mainly including the following three core experience problems:
- The overall link of plug-in opening is long, which will affect the startup of plug-in in the stages of placing, commercial ordering and container operation and loading.
- Because of the design of permission control of the master sub-account, the master account can limit the permission of the sub-account in various functions, and the sub-account will be blocked when it is used.
- There are many logical problems in the ISV or the two parties’ businesses. As a platform, Qianniu lacks sufficient awareness of online problems and effective levers to promote governance.
Integrated control plan
- Optimize plug-in startup link: improve the fault tolerance of startup link technology products, and optimize the success rate of opening to more than 99.7%.
- Closed loop construction of permission application: improve the efficiency of permission application and approval, and optimize the experience of using plug-ins for sub-accounts.
- Building a data measurement system: Precipitation drives business optimization.
There are a lot of business in Qianniu, and the reasons for public opinion are complex and change quickly. It is not enough to cut into a single solution with public opinion problems. The governance of open business is gradual. On the one hand, known problems should be solved; on the other hand, measurement standards and stability monitoring should be established for core nodes in the startup and operation stage of plug-ins to consolidate governance achievements. To governance – monitoring – prevention – optimization of the train of thought to drive public opinion decline.
Enabling Link optimization
Startup Process
- Protocol routing: Qianniu officially defines a set of open nodes (pits) and the corresponding standard protocol (E.g. Tradedetail view order details), which ISV implements according to the protocol to undertake functions. This phase involves parsing the configured protocol and routing to the default plug-in AppKey for user Settings or operations configurations;
- Plug-in meta information search: Find the plug-in meta information corresponding to the target AppKey from the plug-in list delivered by the server.
- Permission verification: verify whether the sub-account has permission to open this plug-in;
- Commercialization guarantee: complete the subscription of free version for new users or users whose subscription relationship has expired;
- Pre-authorization QAP: For three-party plug-ins, explicit authorization from users is required to allow three-party ISVs to access data.
- Container routing and rendering: Based on the plug-in meta-information, the business parameters are assembled and handed to the corresponding container for rendering.
Full link monitoring
The cause and distribution of plug-in startup failures need to be located first to determine subsequent governance and tuning options. Although it is theoretically possible to analyze logs for each situation, in practice, due to the heavy workload and lack of a global statistical perspective. Therefore, the whole link monitoring of plug-in startup is established first, and error context information is retained, and accurate startup success rate and failure cause distribution are counted to provide a measurement basis for optimization.
Buried dimensions include target plug-in AppKey, technology type (H5, QAP, applets), error phase, error description, open plug-in source or entry information, and start and end times for each phase.
These dimensions serve several purposes:
- Configure alarms of different dimensions. For example, the success rate of H5 plug-ins drops suddenly or the number of errors in a certain phase increases significantly.
- When the overall success rate changes, it is convenient to compare the trend of different dimensions and quickly locate the level of the problem;
- The error stage information is convenient to view the error distribution by stage and optimize the success rate by stage.
- Opening the source and entry information provides more information about the scenario in which the problem occurs.
Plug-in launch optimization special
Through the full link monitoring, we can see that there are two kinds of errors. One is that the plug-in fails to open the front link, and the other is that the order relationship is not established.
The fault tolerance of the front link is improved
The main cause of the pre-startup link error is the absence of key startup link information, such as plug-in meta-information and small program packages, caused by weak network or server jitter. Qianniu sorted out the head plug-in of the core business link, built in meta information, and pre-downloaded the small program package by using the subscription relationship and scene information. After optimization, the hit rate of the small program package increased from 85% to 97%, and the overall failure times decreased by 55%.
Commercialized guarantee scheme upgrade
In Qianniu Terminal, merchants must establish an ordering relationship with plug-ins before they can use plug-ins, which is a commercial model of periodic ordering. Before using qianniu for merchants to renew the free version to ensure that the main features available. After reviewing the old solution, it is found that the original product link cannot cover the ordering failure scenario. After upgrading the commercialization guarantee scheme, the number of related errors decreases by 56.7%, and the front-link time decreases by 170ms. The strategy is as follows:
- Abnormal compensation: the coordination server adds the order status information. If the order is being established, delay retry the query and wait for the relationship to be established before opening it (the order process involves huijin and other external systems, and the effective time fluctuates greatly); If the order is not successful (E.G.I SV is penalized and the order is frozen), guide the replacement of the same plug-in (and use the quality-weighted impact ranking to drive ISV optimization)
- Performance optimization: add frequency control strategy to pre-order link and reduce pre-call frequency. Added postrenewal to extend the validity period, avoiding the pre-process and also overwriting scenarios where the applets are not opened through the plug-in process. For the most commonly used default plug-ins, add silent renewal at idle time.
Results of launching optimization: The success rate increased from 99% to 99.7%, and the front link increased from 350ms to 130ms;
Permission application link construction
The seller side uses the master sub-account for team coordination, and the sub-account will encounter the problem of open experience with insufficient permissions. This year, Qianniu has built permission application links on mobile terminals to optimize merchants’ sub-account use experience and permission approval efficiency.
Applying for a link:
-
Extend the client API to the two parties, the two parties take the initiative to call trigger application guidance, to meet the general needs; Because the two – way check flexible, the lack of unified closing.
-
The section detects that the permission on the three-party link is insufficient, and an application prompt is automatically triggered. Due to historical reasons, three-party applets are divided into two types (permission granularity dimension) and have different detection methods. A. Small program upgraded from QAP: marked with refined authorization, the authorization granularity is the permission package (matching the error code of TOP response) b. Enter directly as applets: There is no refined authorization label, the authorization granularity is applets application level (match getAuthUserInfo API error code). By listening to apPLETS API calls, trigger the application process; The former also needs to use the TOP API name to change the permission package and permission point information to the server, and then create the work order and notify it to the main account for approval.
Approval link: when the primary account receives the message to be approved, click open to go to the corresponding permission approval details page.
After optimization, the number of errors caused by insufficient sub-account permissions decreased by 57%, and the related public opinion decreased by 52%.
Data measurement system construction
The functionality and quality of open tools depend on business logic and service availability and stability, so define core metrics, monitor online exceptions, and deal with them in a timely manner. The construction of data measurement system mainly includes the construction of body sensation index, optimization of PUBLIC opinion SOP, and the construction of open experience market.
Somatosensory index construction
Built the business success rate of TOP and cloud applications through the event mechanism of small programs, self-built the body sensing white screen rate (H5, WEEX and small programs), expanded the small program white screen rate detection scheme, monitored the problems on the business side for many times, promoted rollback and repair.
Plug-in core performance quality metrics
The following figure shows the core nodes during the operation of Qianniu plug-in. The stability of online plug-in operation can be comprehensively monitored by establishing corresponding indicators. In addition to the common pure technical indicators: interface business success rate, Bridge API success rate, JS error rate, etc., Qianyu also builds the mosaic-sensing white screen rate to reflect the online operation quality of plug-ins.
Somatosensory white screen rate
Although there are technical indicators of core nodes, these technical indicators cannot completely cover the scenarios where functions are unavailable: one day, the cloud application is expanded, and the order data is empty due to configuration problems of new machines, but the interface is successful; Some technical indicator errors do not necessarily mean that the function will be unavailable, such as some JS errors. Therefore, indicators need to be established to directly measure the available rows from the sense of motion. Among them, the most common problem is the blank screen.
White screen rate definition: in a certain period of time, page elements can not be displayed in time, resulting in the page layout can not be out, or the wrong bottom page, or part of the picture can not be out of the page, defined as white screen.
Detection scheme: In the small program scenario, it is mainly divided into three stages: noise scene filtering, white screen detection, and result reporting. The results are mainly reported and reported by the magic rabbit burial point platform, which will not be described here.
Noise scene filtering:
- After opening A small program page A, ISV code will automatically jump/switch to other page B, and real rendering will be triggered only when the user visits page A. Therefore, if A is detected when A is not visible, it will be considered as A blank screen. The fake white screen caused by the difference in the implementation of this technology does not affect the user’s motion perception, and is not our detection target;
- A large number of small programs in Qianniu are provided by three parties and require authorization before accessing user data. Therefore, if the detection is still stuck in the authorization process, the small program page is blank, which can avoid misjudgment by detecting the authorization box.
- Some applet pages use the same layer rendering capability, such as applet pages using video, maps, etc. These elements are not normal HTML elements and cannot be detected by JS. However, they are not blank screen and need to be filtered by configuring the page whitelist.
Detection strategy:
The main strategy of white screen detection is to count the number of valid elements, which refer to valid information carriers such as text and pictures. In the small program and H5 plug-in, by injecting JS into the WebView, get interface element statistics. Qianniu terminal is different from most C-terminal applications. Qianniu terminal has many pages with heavy input. For example, the answer page of q&A plug-in has a large area of input box. In Qianniuanduan, if the merchant has no order, the page may appear a large blank, so it is necessary to filter the existence of “not yet “,” not yet “and other white list copywriting, to avoid misjudgment as a blank screen.
A classic case
The services of both parties are blank
In April, a blank screen appeared when 4g network was used to access a business line of two parties. The white screen rate indicator generates alarms in time, and the white screen rate can see significant changes in the minute level. After repair, it falls back, avoiding public opinion surge.
Traffic limiting is triggered by three-party service revision. Procedure
In March, I received an alarm that a three-party plug-in in the head called the logistics interface. The main error was that the flow was restricted, and the success rate was only 80%, obviously lower than the average level. The new ISV queries logistics information in the order list. As a result, too many logistics interfaces are invoked, triggering traffic limiting. Notice ISV from the product level of the adjustment after the success rate increased significantly.
Public opinion SOP program
Most of the user feedback on Qianniu is from the global feedback portal, which lacks the plug-in context. Moreover, users lack the mentality of plug-ins, weak directivity when feedback problems, and difficult to analyze public opinion and drive problems. Qianniu’s plan is to record the moving line of plug-in use, display the recently used tool when users feedback public opinion of plug-in, guide the selection of feedback target plug-in, add target plug-in information and problem classification to public opinion information, and facilitate statistics and alarm. After the launch, the proportion of open public opinion with the plugin AppKey increased from 11.95% to 95%.
Open experience market
Through the integration of public opinion, technical indicators and somatosensory data, Qianniu has established an open experience market, so that plug-in experience can be tracked and measured. According to the plug-in public opinion number, technical indicators, body sensing indicators to establish the plug-in quality market, qianniu plug-in quality at a glance, become an important starting point to promote the two-party optimization.
conclusion
Qianniu optimized the fault tolerance capability of plug-in startup, built the product link of sub-account permission application, and improved the availability of its own link. Also by building body sense indicators, improve public opinion feedback and analysis ability, set up to open the market, the quality of the online plug-in run established the god point of view, can discover the problem come from line, also can pass the data to a target driven two three business optimization, system promoted the businessman in thousand cow side using the experience of open tool.
Follow us every week for 3 mobile technology practices & dry goods for you to think about!