Iqiyi Overseas App is a re-run App. For the top navigation, my page, popover and so on in the App, operation management should be carried out according to different dimensions such as mode, version, platform, language, channel and so on. With rapid business development and rapid version iteration, how to maintain the efficient, stable and flexible configuration of operational resources and how to efficiently and stably provide support for new operational requirements are the problems we need to solve.

In this context, iQiyi overseas Phone back-end RESEARCH and development team to build a stable, flexible, efficient operation and configuration platform to solve the problems encountered in the front. This article is to share some of our experiences, challenges and thoughts in building an efficient operational configuration platform.

1. Configure resources to be disassembled

Operation configuration can be divided into two parts: operation resources and basic data.

1.1 Operation Resources

In simple terms, operational resources can be understood as advertisements, operational activities, etc., which change frequently in the App. For example, the pop-up advertisement in the picture above is a typical operation resource. For such operational resources, they typically have the following characteristics:

** Strong timeliness: ** is only displayed in a fixed position at the C end within a certain time range.

** Patterns are strongly related: ** Every campaign and AD will only appear in certain patterns.

** Data changes frequently: ** Especially the activity data, the pictures and documents displayed change frequently.

** Support multi-language display: ** Based on the situation of iQiyi overseas site facing global users, different modes need to display different language copywriting.

1.2 Basic Data Configuration

Basic data configurations change less frequently than operational resources and are less dependent on time and version. For example, the following iQiyi overseas App- bottom navigation bar (as shown above). This type of configuration has the following characteristics: ** Multidimensional: ** requires different configurations for different modes and languages. ** Long-term validity: ** This type of configuration usually exists for a long time, and there are few expiration scenarios.

2. Pain points in practice: low operational efficiency and much repetitive work

Faced with continuous operational configuration requirements, we initially implemented different configuration interfaces to connect various operational product requirements. However, this is bound to encounter great problems, which are mainly reflected in the following aspects:

Low operational efficiency

For the new operation configuration requirements, the R&D students need to develop the corresponding configuration page, and then transfer it to the operation students for configuration management. Finally, the operation personnel configures the resources online. The flow chart is as follows:

For each operational configuration requirement there is a process of requirements review, page development, configuration management, and rollout. At the same time, for the development of the configuration page, at least 1 to 2 days of development time, research and development costs are high. The problems are summarized as follows: 1. The r&d cost is high, and a new CONFIGURATION management page needs to be developed for each requirement. 2. Long r&d cycle, low operational efficiency, and long cycle from requirement proposal to operation online. 3. Poor flexibility. Different operational dimensions (mode, version, time, etc.) need to be determined in advance and cannot be dynamically adjusted.

For general-purpose operational configuration backends, some specific features are particularly repetitive for the front and back ends. Functions such as operation records, audit mechanisms, filtering data according to different schema version languages, etc. need to be developed repeatedly each time the configuration requirements arise.

3. Thinking in practice: Clarify the three principles

In view of the above problems, we hope to design a general solution to solve the various operational resource management problems described above. This is what we call an IQ operating position. Through research, we have confirmed the following three principles of project design: All data can be configured, operational data, highly available interface performance, efficient Through continuous practice and summary, we hope to realize the above principles from the following three aspects:

3.1 Data Jsonization

As the business continues to iterate, no matter what data field composition is used, it is difficult to meet the business changing field requirements (in this case, like title, subtitle, picture, jump link, etc.). After the underlying data is JSON, the corresponding data fields can be dynamically extended to meet the requirements of continuous business iteration. Jsonization also brings about the problem of operation bit field management, which is solved by providing corresponding field management function in operation background.

3.2 Multipoint Storage of Operation Data

Through persistent storage, distributed cache, local cache of the access service side, and multi-party storage of operational data, degraded data can be obtained in extreme cases to reduce the loss of system exceptions.

3.3 SdK-enabled interfaces

For operational data, the problems of service centralization and service jitter cannot be completely solved by either the landing scheme of database or the distributed cache scheme. With the SDK-enabled access, the local cache update mechanism can be implemented to remove the dependence on centralized services and greatly improve the stability and performance of services. At the same time, the entire IQ operation bit service can be expanded horizontally, and the stability of the center service will not be affected during the expansion process. The caller request flowchart is as follows:

4.IQ operation bit architecture

IQ operating position configuration system overall frame diagram. From the functional point of view, there are four layers: data layer, service layer, access layer and monitoring layer. IQ operation bit architecture diagram is as follows:

4.1 the data layer

The data layer mainly stores all kinds of operational data connected to IQ operational bits. The data layer mainly faces the following difficulties. ** Difficulty 1. Large amount of data; Difficulty 2. High QPS. ** Based on the above two difficulties, we use Redis cluster to do the intermediate cache, use SDK to enable each business side to access the local cache, and solve the traffic pressure of the center service by means of message listening and asynchronous update.

4.2 the service layer

The service layer operates on the underlying data; It provides the access capability for obtaining data from the access layer upward. It provides four service capabilities: operation background, open platform, data service and IQkit-SDK. The operation background is mainly for operation personnel and products, providing data configuration background. The open platform is owned by development technicians and provides a background for adding operational bits to the configuration. Data services mainly provide a unified, highly available, high-performance API interface for the development of call data. The SDK serves as a data service. The main focus now is to simplify access costs for developers and provide data service performance and availability.

4.3 access layer

** How to make C terminal access more convenient? ** In order to simplify the access cost for developers, the invocation logic is implemented in SDK. Users only need to introduce Maven package, inject OppkitClient, encapsulate OppkitRequest, and directly call OppkitClient to return filtered and translated data. ** How to configure the b-side more conveniently? ** There is one and only one principle for background configuration when designing projects: everything can be configured. Each IQ operation bit is equivalent to a business form, such as navigation bar. The operation bit contains multiple data, such as title, link, etc. Title contains multiple languages, so you need to configure multi-language keys. The role of the open platform is to create the IQ operation bits and configure the fields for the IQ operation bits. The operation background is used to configure the IQ operation bit data created on the open platform.

4.4 monitoring layer

In addition to the monitoring of the data storage layer and the monitoring of the data layer and service layer by the beacon tower, we also monitored the local cache implemented in the SDK. The access of THE C side, that is, the acquisition of data, is implemented in the SDK. In the SDK, we do the following functions: 1. If the request contains some specific discrete fields such as device ID, because of the large amount of data contained, storing in the local cache will bring great pressure to the machine memory of the business side, so the request service should avoid the cache directly. 2. Add the logic that is not connected to the local cache to meet service providers’ requirements on real-time data. 3. If only some fields with high degree of aggregation are included, such as platform field, version, pattern, language, etc., the requested data is stored in the local cache. The local cache performs asynchronous updates by listening to the operating platform. When data fails to be obtained through asynchronous updates, the previous data is returned. In extreme cases, all operation data is empty and service loss is minimized. 4. The SDK internally saves the usage of local cache through the timing thread through asynchronous thread, displays the usage of each cache through the background interface, and monitors the usage of various caches in real time.

5. Stability and performance guarantee

As mentioned above, we adhere to the following principles in the design consideration of the operation background:

  • All data can be configured
  • Operational data is highly available
  • Efficient interface performance

Here we introduce the solutions to ensure the stability and performance of the operation background.

The overall request flow chart is as follows:

5.1 Stability Guarantee

As the operation background of all kinds of operation data configuration, stability is particularly important. In addition to the operational mechanism of routing data configuration, multi-level data storage using distributed cache, and distributed database, we also provide an SDK solution to degrade service failures. Below we will introduce the landing process of the scheme in detail.

SDK local caching scheme

We consider that there are several benefits to implementing local caching:

1. Ease the traffic pressure of the center service. More traffic will be sent to the memory of the local service.

2. Based on the business characteristics of iQiyi overseas station, the foreign network environment is unpredictable and poor, so as to reduce the network request links as much as possible.

3. Once the central service is faulty, notify the service providers not to redeploy the service and use the local cache to realize data degradation.

However, the disadvantages of the local cache solution are also very obvious, that is, once the operation background data is updated, each business side cannot get the latest data in real time. Based on this, we iterate over the SDK in the following versions. See the following figure for details:

The evolution of SDK internal technologies

The technical architecture has evolved to the third version, which can better solve the traffic problem of the center service. The traffic of the operation background is determined by the data update frequency of the background instead of the user request volume, thus solving the traffic overload problem. However, the implementation of this version also needs to solve the following difficulties:

1. How can I monitor the local cache usage of each business party?

2. How to design the MQ scheme?

For problem 1, we implement a mechanism in the SDK. Scheduledexecutorservice is used to periodically pull the cache usage into the library, so that the local cache usage can be displayed in the background interface according to the time. In this way, we can systematically grasp the use of different business side cache, and provide data support for business side memory application and allocation.

For problem 2, there are mainly two difficulties involved :(1) business service machines generally have multiple machines for a service, so the update of a message needs to be consumed by multiple servers deploying the same code at the same time, so that each machine can obtain the latest data. (2) There are multiple operation bits, but it is not necessary for the business side to asynchronously request the operation background center service to update data when the operation bits that are not connected update data (because the business side does not access these data at all). For (1), it is clear that the producer of the message is the operation background service, and a message needs to be monitored by all business parties, specifically, every machine of all business parties. Therefore, each machine should belong to a different consumer group. So we need to find an identification node that is different for each machine and use this node as the consumer group. Obviously, the best thing about this node is the machine address, which ensures that each machine is in a different group. For (2), we provide a configuration file, the business parties need to write their business parties within the configuration file using IQ operation a name, when a message comes, you will first need to determine whether operating a name in this message is included in the configuration file, if not, then the message are ignored (empty), if in, The operation bit of the response is requested to update the local data.

5.2 Performance Assurance

In addition to the local caching provided by the above SDK to improve the performance of the back-end services, we have done a few other things.

In the practical configuration of operations bits, we find that changes in operations data or changes in operations data by operators are very low frequency compared to network requests, such as the underlying operations data previously analyzed. Therefore, data cached on the client can avoid network consumption between the client and back-end services, greatly improving performance.

Our solution is to provide the concept of one version per operational bit of data. By saving every bit of the latest operation time, initiate a request when the client the tail, all the operations a recent data update time returned to the client, the client will the timestamps are cached in the local, the next time when opening the request, will also get to the operation of server returns a recent update timestamp, local matching with the service, To confirm whether to update the data of each operation bit. If the operation data time cached by the client is consistent with that returned by the operation background, the data cached on the client is directly displayed.

Another advantage of this approach is that in extreme cases, such as the failure of exposed APIS, business data can be displayed normally by prohibiting data updates in the operation background, and serious failures in which operational data disappears can be avoided.

The specific request flow chart is as follows:

6. Summary and outlook

This paper mainly introduces the relevant content of the design and development of IQ operation bit. Firstly, according to the pain points encountered, the design principles of operation background are put forward: all data can be configured, operational data is highly available, and interface performance is efficient. Consider and implement specific technical solutions according to the proposed principles.

Jsonization of configuration data is used to achieve the scalability of business fields. The data model designed is introduced to meet the requirements of multilingual operation configuration data methods. By providing SDK internal implementation of local caching, MQ listening, asynchronous update mechanism to solve the problem of large traffic of service center and data inconsistency caused by cache. In view of the overseas specific situation, the relevant scheme of client cache is proposed.

Since May this year, the operation team of IQ has been planning for nearly half a year. Due to many product demands, questions were collected through two channels of personal use and user feedback, and two versions of iteration were carried out before and after. At present, the operation background interface and the convenience of use are continuously optimized.

IQ operating online more than two months in iQIYI overseas, widely used in the engineering efficiency but also improve the quality, take the example of the latest error code configuration, error code needs to give the client returns of all kinds of error code and corresponding relative copy, copy is multi-language scenarios, by IQ operating a configuration change, need only based on the analysis of demand, After splitting the conditions of business fields and data exposure, the corresponding operation background can be given within 5 minutes.

Of course, with the business iteration and scene update, THERE are still some imperfections in IQ operation position. In the future, we will continue to do further work in the process of continuous iteration in engineering practice, solve various problems, and better serve the customers of IQ operation position and the majority of iQiyi overseas users.

Team Introduction:

Iqiyi Overseas Phone Back-end team: Under iQiyi Overseas Business Division, responsible for the development and maintenance of iQiyi overseas phone back-end services, providing users with stable, efficient and smooth video content services. Zhengfei: PROJECT leader of IQ operation position, designer and developer of IQ operation position.