Introduction: “take customer as the center, technology as product service” is the principle that Aipanpanclue housekeeper team always follows. The technical architecture planning should first focus on business demands and empower products with reasonable technologies. In the continuous evolution of products, higher standards and requirements for technology are put forward. As the highest page of Epanfan PV, this article will introduce in detail how the clue list from the rapid delivery of the original state of slash-and-burn, gradually to the mature stage of “high availability, high quality, and high experience”.
The full text is 9355 words and the expected reading time is 24 minutes.
Tables are one of the most common ways to present data in backend systems, as common as water, electricity and coal in the system, so much so that you might ask what are the technical challenges of table development?
What capabilities should a list provide?
The list page has three basic modules:
-
search
-
Search box (directional search for a piece of data)
-
Filter items (Preset search criteria to quickly find results that match the criteria)
-
The data presented
-
header
-
data
-
The pager
-
Action items
-
Action buttons (for entire rows of data), such as allocation of clue lists, phone calls
-
Modify data (for a field), such as adding labels
The list carries business data. The design experience of the list is related to the efficiency of users’ processing and management of business data. The ultimate goal is to improve the efficiency of customers’ using data and making decisions.
Second, the evolution of technology
With the rapid iteration and evolution of Aipanpan products, the clue list has also moved from the original state of slash-and-burn fast delivery to the mature stage of “high availability, high quality and high experience”. Here are the different versions the clue list has gone through, the pits it has stepped on, and the corresponding solutions.
V1 – Quick landing base ability
In the early stage of Aipanfan project, in order to ensure the rapid implementation of the project and help users to quickly query and follow up clues, clue information was stored in MySQL to provide basic CRUD capability.
V2 – Configure the scaling capability
With the continuous evolution of the business, the contradiction between the basic form and list functions and the user configurable, extensible requirements is becoming more and more prominent, how to support the ability to customize, is a major problem that clue list faces. Aipanfan housekeeper team mainly from the “metadata driven”, “index parser”, “front-end rendering engine” three aspects to solve.
2.1 Metadata driven
With the continuous iteration of cue products, the number of preset cue fields keeps increasing. In addition, the ability of clue upgrade supports custom fields. The original fixed retrieval items on the page can no longer meet users’ requirements for retrieving all the clue fields. Therefore, we propose to dynamically render the retrieval items through user-defined retrieval item metadata to meet users’ personal preferences.
1, UI rendering information to complete the retrieval items in the page dynamic rendering
2. Retrieve the condition builder and dynamically construct the index data of the retrieval item
3. The search item operator supports the search range of different indexes
4. The search term relation can dynamically assemble the query scene of the search condition
5. When a new filter metric needs to be added, you only need to add the retrieval item metadata and the builder that implements the retrieval item
2.2 Indicator parser
With the increase of clue fields, it is necessary to realize flexible configuration of table header and fast loading of list items, and extract the metadata of user-defined list index items.
1. Customize lists
In the service of list retrieval, the user – defined list items can be flexibly configured by accessing the general user-defined index service.
2. Index retrieval configuration
By introducing the concept of index, the query of list field can be abstracted into the query of index. In this way, different query scenarios can be modeled as queries of different indicator data, and indicator data can be arbitrarily combined to meet requirements. At this point, the bottom layer only needs to provide a general index retrieval service.
3. Rear processor
For different scenarios, list fields need different presentation forms; By introducing a post-processor, the rendering logic can be customized in a configurable manner. You only need to configure the metadata of a new indicator.
2.3 Front-end configuration engine design
Initially, the clue list and filter supported only a small number of field displays and screening, so we only needed to render specific field UI components. But as the business increased complexity increases, fields, and custom fields to join, just for a specific field for special treatment for rendering brought huge maintenance costs (increase the field need to separate the UI component development, need to constantly adjust related business code) and regression test cost, and can meet the screening and custom fields. Therefore, the configuration engine design is introduced, which takes component as the basic unit and completes the page related content rendering through component parameter configuration.
1. Each component is a standardized component driven by metadata, which is divided into components and configurations.
2. Merge the component configuration information returned by the backend with the default local configuration to generate the final configuration.
3. The component configuration engine dynamically binds events, transfers data, and renders components by parsing component configurations;
After the introduction of configuration engine design, the front-end only needs to invest development manpower when the new component type is added. For other homogeneous requirements, it only needs to modify the configuration at the back end to take effect, which greatly reduces the development and test regression cost.
V3 – Upgrade experience
3.1 background
Cue list is the core business scenario of cue Manager. With the continuous development of business, it is urgent to upgrade the experience of cue list.
3.2 Design Objectives
Based on the design values of easy to understand information and easy to experience, customer satisfaction of clue list can be improved by reorganizing the list information and upgrading the experience.
3.3 Design method
3.3.1 Page Disassembly
The clue list page consists of three modules: title, toolbar and table.
-
Title area: including title, rule description and other information, summarize the information of the entire page;
-
Toolbar area: including data filtering (filter, search), functional operations (new, import…) , carrying page add, delete, change, search and other operations;
-
Table area: includes table header, table body, and pages to display the core information of the page.
3.3.2 Core pain points
-
Header area: unclear rules, high cost of understanding;
-
Toolbar area: data filtering and function operation density is large, interference information is complex;
-
Table area: the core information display screen has low efficiency and high operation cost.
3.3.3 Design strategy
1. Upgrade the title area
- [Title area] Rule description is optimized from icon to icon + text form, intuitive and easy to understand;
2. Upgrade the toolbar
-
[Function operation] Exposed important high-frequency operation, folding low-frequency operation, reduce the interference to users;
-
[Data filtering] Exposed high frequency screening, folding low frequency screening, improve the screen efficiency ratio of core table area;
3. Upgrade the table area
-
[Table header] Slide on the page, table header top display, easy to locate the field information, expand the vertical space of the clue list;
-
[Table body] Page horizontal slide, horizontal scroll bar suspended at the bottom of the screen, improve horizontal operation efficiency;
-
[Table body] Operation column adopts primary and secondary form, exposing high frequency function and folding low frequency function, effectively improving operation efficiency and expanding function space.
▲ Comparison before and after clue list upgrade
3.4 Design Benefits
Through the four dimensions of composition, interaction, feedback and adaptation, the page information was deeply dug and restructured, and the screen proportion in the core form area reached 70%. The data display was more reasonable, and the ease of use and customer satisfaction of the clue list page were improved.
V4 – Upgrade retrieval capabilities
4.1 Status quo and challenges
At present, there are 58 preset fields, 125 dynamic extended fields and 100 million level clue data in the system. Business continues to upgrade all cue fields into custom retrieval (multi-condition retrieval, word segmentation retrieval, etc.) and the ever-expanding data make the retrieval performance facing challenges, we use ES to replace MySQL to achieve high performance advanced retrieval function.
4.2 Scheme Design
WATT, Baidu data stream open platform, is a platform for self-development, operation and maintenance of data stream. Data streams capture the changing data in the mysql database and publish it in real time. They are published in two ways, incremental and benchmark, to provide data in text format, ensure real-time, orderly and non-loss of data, and also calculate the subscription data.
** solution: ** retrieval is migrated from DB to BAIDU cloud ES service. Data is updated to DB after users operate clues, and then data is synchronized to ES through WATT, MQ and write services.
** Benefits: ** Improves query efficiency, supports full-text word segmentation scenarios, data relevance ranking, etc., supports any combination of query conditions without affecting query performance.
V5 – Support for Read your writes
5.1 Status quo and challenges
Clues to the list after retrieval from MySql to ES, brings new problems and the DB in the process of synchronizing ES, vulnerable to rely on the environmental impact has led to the delay of data updating, lead to the user in the list of clues to see is not the latest data, has certain influence on the user experience, the stability and accuracy of the clue list has brought new challenges.
The status of data flow is as follows:
In the above mentioned, DB synchronization of ES requires a total of 5 steps, and any delay in any step will cause the data update of clue list not timely.
5.2 Optimization Objectives
Even if there is a delay in data synchronization from DB to ES, the data in the clue list can be guaranteed to be consistent with the result after user operation.
5.3 Scheme Design
1. After the user operates on the clue (creating, allocating, editing, etc.), the user writes the ID of the clue after the operation into the Redis ordered queue. Since the delay time of DB synchronization to ES is about 0 ~ 2s, the automatic expiration policy of the REDis queue is set to 10s.
The redis queue only writes the cue ID, instead of storing a complete snapshot of the cue data:
A. Cue has multiple attributes, complex structure and fast iteration change. It is relatively lightweight to record only cue ID, and the change of cue itself does not affect the data structure cached in REDis; B, query according to the ID query from the DB, the data is more accurate, because there is no transaction to ensure that redis and DB data consistency.
3. User query clue list data
A. Obtain the cue ID of the latest 5s in the ordered queue of Redis
B. Assemble ES filtering condition to query ES data and obtain REDis cue ID to screen DB data. In order to ensure query performance of the list, concurrent query is adopted
C. Filter DB cue data according to ES filtering conditions through expression engine to obtain effective query results
D. Merge ES data and DB data using the Merge policy
The merge strategy
To briefly illustrate the various scenarios of merge, let’s define a few objects
Leads_db: database view corresponding to the clue. 3. Result_es = {…. } : ES Result set of query results 4. Result_redis_db = {…. } : query DB result set by redis result_redis_db_filter = {… } : result_redis_db is the result set filtered by the expression engine based on the filter criteria. Candidate = {…. } : The final result set is called candidate result set
Candidate selection Strategy
1. If there are new or modified clues and the filtering conditions are met, the clues queried in DB should be added to the candidate set
2. If there are no operated clues and the filtering conditions are met, the clues queried in ES should be added to the candidate set
3. Manipulated clues that do not meet the filtering conditions should not be added to the candidate set
** Effect benefit: After the launch of **, there is no problem of list data delay after users feedback operation clues. The first search rate after labeling clues and the first search rate after editing clues are 100%.
V6 – Performance optimization
6.1 How do I Measure Performance
If you can’t measure it, you can’t improve it!
Page performance measurement is a prerequisite for optimization, and front-end performance monitoring is a perennial topic. Monitoring direction, monitoring index, buried point scheme and reporting tool are all points that need to be considered. Commercial and in-plant buried point platforms in the industry cannot solve problems such as easy to use, real-time and full-link monitoring at the front and back ends. To this end, Aiphanan launched the special project of “big front-end APM”, and explored the most suitable front-end APM architecture for Aiphanan by referring to the RED index of the server and the mainstream front-end burying point scheme of the industry.
6.1.1 Monitoring Targets
From a global perspective, you can gain insight into the real performance of statistical page urls and perform statistical time analysis between any phase of the operation flow.
Starting from the case problem, it can carry out any full-end call chain tracing based on the user ID.
6.1.2 Solution
collect
Embedded SDK: Aipanfan business uses the payment system “Shence”. In order to avoid duplication of wheels, the front-end APM SDK carries out secondary encapsulation based on Shence SDK and version management in the form of NPM package. The user initializes the buried SDK in the public module. Add non-intrusive performance collection capability, and provide configuration expansion capabilities such as sampling rate.
report
Integrated image, sendBeacon and Ajax reporting schemes. First, the complete URL with parameters is spliced and the LENGTH of THE URL is judged. If the URL length is less than the maximum length allowed by the browser, the front-end performance data is sent in the form of dynamically creating img tags.
If the URL is too long, check whether the browser supports the sendBeacon method. If yes, send the request through the sendBeacon method; otherwise, send the synchronous Ajax request. Example code is as follows:
function dealWithUrl(url,appId){let times = performanceInfo(appId); let items = decoupling(times); let urlLength = (url + (url.indexOf('? ') < 0? '? ' : '&') + items.join('&')).length; if(urlLength<2083){ imgReport(url,times); Else if(navigator.sendbeacon){sendBeacon(url,times); }else{ ajaxReport(url,times); // Example method}}Copy the code
The front and back ends Trace through
6.2 Performance Issues & Objectives
Problem: Front-end APM will be fully implemented in Q2 2021, making a big step forward for Aipanfan’s front-end performance monitoring. But at the same time, some problems such as poor performance of many pages and long perception time of large amount of data are exposed. Among them, the performance problem of clue list is particularly prominent, and the perceived time of page initialization is more than 5000ms (P90).
Objective: Through statistical analysis of all pages of Aipanfan, the front-end overall performance index is “page initialization time perceived by users is less than 2000ms (P90)”, and the segmentation objectives are as follows:
6.3 Optimization Ideas & Solutions
Overall optimization idea and rhythm: first grasp the main contradiction, pick the low-hanging fruit; Then dig out the details and tackle the difficulties. Each direction to do the best, each break.
-
[Link] Clarify the complete link and time consumption of the page, and comprehensively understand the performance status of the page.
-
[Back-end] In-depth analysis of back-end code implementation logic, from concurrency, cache, ES tuning, thread pool tuning, dependency timeout analysis and other aspects to optimize interface performance.
-
[Front-end] Analyze front-end code, compilation and configuration, browser Performance flame chart, optimize front-end Performance from static resource volume, JS Runtime Long Task, rendering Performance and other aspects.
-
[Interaction] Discuss interactive upgrade schemes with products and designers to achieve the ultimate user experience.
6.3.1 Time-consuming Link Analysis
BFE, Baidu Front End, is Baidu unified seven-layer (HTTP/HTTPS) traffic access platform; Provide traffic access services for the whole Baidu.
Users can perceive time-consuming links:
-
Loading static resources from CDN nodes (static resource time)
-
Execute js file, send Ajax request (total interface time starts)
-
It takes time for the browser to send an Http request to the BFE
-
It takes time for BFE to forward packets to the Access Gateway link
-
It takes time for the Access Gateway to forward the BFF service link
-
Service time of BFF itself (including internal service request, assembling micro-service, etc.) is counted by Skywalking
-
BFF returns response, indicating that the link to the Access Gateway takes time
-
The Access Gateway returns response, indicating that the link to the BFE takes time
-
BFE returns a response to the browser network time
-
Browser rendering time
Question:
-
The BFF Node.js service takes a long time to invoke the back-end microservice link.
-
When BFF is invoked in the domain name -bfe-microservice mode, the connection timeout problem of some BFE VIPs cannot be solved completely.
BFF call chain before optimization:
Solution:
- The BFF module upgrades the Mesh service. BFF invokes backend services by gateway domain name and is upgraded to Mesh Service invocation.
Optimized BFF call chain:
** Benefits: **BFF takes 100ms+ to call microservice links, and the frequent timeout problem of BFE VIP connections is completely solved.
6.3.2 Interface Performance Optimization
By analyzing the interface implementation logic and the SkyWalking call chain, we found the following three types of problems:
The first type of problem: code implementation problems. Low-hanging fruit is easier to pick.
Question:
-
Unnecessary serial requests exist on the BFF and back-end interface links
-
Caching is available on both the front and back ends
Solution:
-
At the BFF implementation level, three serial requests are optimized into two necessary serial requests, and the rest are processed in parallel using promise.all
-
The back-end processor for the server interface uses thread pools for asynchronous processing
-
Add Redis cache to label metadata, header configuration metadata and other interfaces
-
The front-end preloads interfaces such as permissions and query conditions and adds local cache
The second type of problem: performance tuning. This part needs to understand the underlying implementation principle of ES and RPC framework.
Question:
-
The performance of fuzzy query retrieval using ES is poor
-
Lead service pressure test is not up to standard, high QPS request queuing
Solution:
-
Use full text search instead of fuzzy query
-
Adjust the number of working threads for RPC calling RPCIoWorkThreadNumber
The third type of problem: timeout problem. This part of the optimization scene is complex and difficult. There are several types of timeout problems:
Question:
-
RPC call connection timeout: Each time the Mesh SDK of the old version initiates a request, a new connection is created. As a result, the caller may wait for the connection to timeout due to a large number of concurrent requests or a high level of service ringing.
-
Gateway query timeout: clue DB cluster sfCrMSALES Disk utilization is high, slow SQL and long transactions are frequent, and DB cluster stability is not high, resulting in DB query timeout frequently.
-
Timeout dependent on external services: The clue list requires authentication of the clue pool permissions, and interfaces that rely on the ACS team permissions frequently time out. After the list retrieval is completed, some fields need to be processed, which involves a lot of HTTP requests from external teams, among which open API calls of large commercial advertisement information timeout is the most frequent.
Solution:
1. Transform the connection reuse of the Mesh SDK and adjust the number of IO threads of the service itself to solve the connection waiting problem in the case of high QPS;
2. Manage DB cluster stability
-
Set up DB stability management special, increase DB alarm, duty mechanism;
-
Cooperate with DBA to thoroughly analyze, troubleshoot, solve, verify and eradicate problems such as slow SQL, long transaction, abnormal connection number and high master-slave delay;
-
Clean up large tables such as re-trial table, allocation log table and outbox table in DB, regularly dump data warehouse of main table of clue original information, and clean up large fields in the table;
-
Migrate non-level 1 business tables in the cluster to relieve the storage pressure of clue DB cluster.
3. Manage the timeout problem
-
Adjust the size of the ACS thread pool to expand the ACS service.
-
The combined query scheme of advertising information interface cache + off-line data warehouse + real-time call is carried out, and the timeout time of post-processor is adjusted to ensure the query speed of most scenes.
Benefit: Lead list server time (P90) reduced from 2500ms+ to less than 800ms.
6.3.3 Front-end Performance Optimization
Front-end performance problems mainly focus on rendering lag and static resource loading time.
The first aspect: list rendering optimization
Background:
As the most popular UI framework in vUE ecosystem, elementUI is selected as the front-end UI framework of Aipanfan because of its powerful functions, sound ecology and active community. The clue list uses the El-Table component of Element-UI to support common requirements of the list scene.
Question:
1. In the case of large amount of data (100 rows and 50 columns), JS Long Task duration is nearly 3000ms, rendering of clue list is slow.
2. The overall data rendering time is long, and the loading time is long.
3. El-table does not support top and bottom suction operations, and cannot be implemented simply by changing CSS styles, resulting in poor experience.
Analysis:
Read el-Table source code found that the implementation of EL-Table is through four HTML native table (table head, table body, list left and right check column) to achieve. Under large data volume, DOM quantity will become extremely large; Especially in the case of header drag, Repaint and Reflow, the calculation is huge, and the Long Task is Long, resulting in page lag.
The open source component PL-table based on the idea of dynamic rendering was investigated, which did not support the function of top suction and had poor support under the condition of unfixed row height, so it was excluded.
Solution:
1. According to the survey of the mainstream UI frameworks in the industry, only AntD Table in React system supports the top suction function, but its VUE version has lag and does not support this reminder. After discussion in the group, we decided to re-implement AFF-UI table based on React AntD table implementation ideas.
2. Progressive rendering of list data. First render the first screen data and end the loading to display the first screen list. Render other list data without user awareness.
Benefits:
1. Compared with EL-Table, AFF UI table reduces the number of DOM by 60% with the same amount of data, and improves the rendering performance by 300% with 100 rows and 60 columns of data;
2. Support list top and bottom function, greatly improving experience;
3. Progressive rendering, the rendering time (P90) of the first screen is reduced from 2000ms to below 500ms.
The second aspect: static resource optimization
Background:
Aipanfan front-end uses self-developed Tangram micro front-end architecture, which is divided into main module “Common” and other business domain sub-modules. Support each agile team page code base, independent deployment operation and maintenance. Front-end static resources are deployed on the BOS of Baidu cloud, and BFE is used to forward and locate domain name static resources.
Question:
-
Static resources are forwarded by BFE, which increases the link time
-
Static resources do not use the CDN network
-
Static resources have a large volume
Analysis:
Static resource optimization ideas, from the link, cache, compile three aspects.
Link and cache are common means of front-end optimization. Compilation issues are usually hidden, so you need to have some understanding of how WebPack builds and packages. The following focuses on the compilation related analysis process:
1. Firstly, through the monitoring platform, analyze the JS loading waterfall flow and locate the chunk-vender.js time-consuming situation, which is the performance bottleneck.
2. Analyze the compilation using WebPack Analyzer:
The following problems are found:
-
Chunk-vendors. Js in the business domain code packages the biz-CRM-FE-common module of the microfront-end framework base, which can be provided in the main application and should be excluded from the business module package.
-
Partial dependency libraries (BCE-SDK, Tangram-UI, Moment, LoDash) are huge and need to be considered as public dependencies in the main app or asynchronously loaded on the page via Promises.
-
A large number of code base routes and multiplexed components result in a large volume of chunk-common.js and the total volume of static resources.
Solution:
Link:
-
Static resources request BOS domain name directly, do not go to aifanfan. Baidu domain name, reduce one layer BFE forwarding link
-
Enable BAIDU CLOUD BOS CDN acceleration
-
Enable gzip compression for static resources
Cache:
-
Preload static resources related to the clue list on the home page
-
In HTML preload clue list related JS, use webpack Require feature, global cache list related JS (this scheme is the advanced version of scheme 1, preload more pre, thorough)
Compile:
-
Modify micro front-end framework Tangram-SDK compilation configuration, exclude the biz- CRM-FE-commom package
-
Split the front-end code base. Migrate non-core pages other than lists and details to the new code base to minimize list JS size
-
Asynchronous loading of large volume dependent JS, so that the initial loading of JS volume compression to the extreme
Benefits:
-
The static resource time (P90) was reduced from 2000+ms to less than 500ms
-
Static resource ** GIZP total volume: **693.52KB → 228.83KB, total volume decreased by 67%
Attached is js volume comparison before and after compilation optimization (drifting green is after optimization)
6.3.4 Experience optimization
Question:
-
The full screen loading of the list takes a long time.
-
List filtering conditions are all loaded in real time, which takes a long time for customers to perceive, and there will be jitter;
-
Table header data default value can be selected (50+ columns), request, render data volume. Users need to swipe 3-5 screens horizontally when using it, resulting in poor experience.
Solution:
-
List screening using side pull panel interaction, local memory last screening conditions and display;
-
Optimize the loading area and mode to minimize user perception time from the perspective of visual perception;
-
Increase the upper limit of the number of table headers by setting data analysis for all user-defined table headers and referring to general industry Settings.
Benefits:
Performance optimization and experience improvement are the same thing. According to UBS research, Avanfan leads the industry in overall experience and performance.
6.4 earnings
After the above optimization means, the overall performance of the clue list reached the standard.
Reviewing the performance optimization journey of nearly a year, we can roughly go through the following stages:
-
User feedback list performance problems, try to solve, but can’t measure, groping in the dark. (20 years Q4 — 21 years Q1)
-
“Sharpening knives without Miscutting Wood workers” investigated front-end performance monitoring solutions and started APM Topic of Aipanfan. (year 21 Q1)
-
“Targeted” APM construction, the Full implementation of the Web end. Have a comprehensive understanding of the performance of the clue list, and gradually clear optimization work. (Q2 in 21)
-
“Pick low-hanging fruit first” uses conventional means to solve and optimize common performance problems at the front and rear ends. (Q3 of 21)
-
“Difficult to overcome, millisecond will struggle” will not let go of any optimization point, on the front and back end of the difficult problems respectively set up a special group, pool wisdom, focus on tackling, do not take the results of the never give up. (21年Q4)
Third, summary
“Customer as the center, technology for products and services” is the principle that aipanpanclue housekeeper team has been following. The planning of the technical architecture should first of all focus on business demands and empower products with reasonable technologies. In the continuous evolution of products, higher standards and requirements for technology are put forward. Clue list is one of the many core functions of Aipanfan. However, technical problems often need to be answered from the business, and understanding the business is the premise of developing technology. At the same time, the upgrade of business complexity feeds back the evolution of technology architecture. Business and technology promote each other, complement each other, and ultimately create value for users.
Product and technology evolution has only a beginning, not an end. In the future, the ability to generalize the list of clues and the ability to expand the configuration will evolve to the PaaS platform. LowCode and NoCode need to continue to explore and practice, and we still have a long way to go.
= = =
Iv. Introduction to the author
This article is written by several students of the housekeeping team.
-
Fei Xie: Architect, specializing in implementing complex systems through microservices architecture and DDD
-
TJ: Front-end development with a background as a designer, striving to transition to the service side
-
Hana: A design lion in the To B industry
-
Dong Shi: Senior R&D engineer, good at business system architecture design
-
Sanmu: Web front-end engineer, good at all kinds of cat
Five, recommended reading:
Decoding optimization in Baidu APP video playback
Baidu Aipanpan real-time CDP construction practice
When technology refactoring meets DDD, how to achieve business and technology win-win?
Interface documents automatically change? Baidu programmer development efficiency MAX secret
Tech reveal! Baidu search medium low code exploration and practice
Baidu intelligent cloud combat – static file CDN acceleration
Simplify the complex – Baidu intelligent small program master data architecture practice summary
———- END ———-
Baidu said Geek
Baidu official technology public number online!
Technical dry goods, industry information, online salon, industry conference
Recruitment information · Internal push information · technical books · Baidu surrounding