The full text is 9000 words, and the expected reading time is 21 minutes
I. Dilemma: Project background
Aipanfan communication quickly completed the product function and technical architecture from scratch based on Baidu Business Bridge, but at the same time inherited the shortcomings of Baidu Business Bridge’s complicated historical functions and outdated technical architecture. In order to better serve the future product evolution of Aipanfan communication and improve the efficiency of production and research, it is necessary to reconstruct the product architecture and business architecture based on practical problems and focus on the main contradiction.
In order to better understand the content of this article, the following are the necessary nouns:
1.1 What is Aipanfan Communication?
Aipanpan communication is an online consulting tool that connects visitors and businesses. On the one hand, visitors can consult anytime and anywhere, shortening the way for visitors to obtain services. On the other hand, businesses can also respond quickly and provide services. At the same time, in the promotion scene, merchants can also feed back to the upstream advertising channels according to the consultation content of visitors, optimize the delivery model, and improve the marketing transformation effect.
1.2 Why Refactor?
Baidu Shangqiao has experienced several different product positioning and version iteration for many years, and the production and research team has also changed several waves of people. There are many customer problems, and the architecture lacks systematic governance for a long time. Constraints for the production and research team at multiple levels:
-
There is no in-team knowledge of the product’s major business logic. It is often necessary for developers to look through project code and piece together a logical outline.
-
The number of customer feedback problems remains high, typical problems are as follows:
-
The identification of visitors is not timely, and customer service can not perceive that visitors have arrived. Not timely identification of visitors leaving the station, easy to mislead customer service to continue to initiate communication with visitors leaving the station, causing the appearance of poor communication;
-
Visitors’ advertising information (sources, search terms, keywords) is not timely and incomplete;
-
The behavior data of visitors in the whole life cycle have probability delay and missing.
-
Merchant greeting, automatic reply sending order is out of order, do not trigger sending, etc.;
-
Login failure, consumption sending failure, mobile terminal message prompt failure caused by service stability;
-
Some of the customer issues are new requirements, such as flexible customization of consulting component styles and support for offline communication.
-
Team morale is low and productivity is low. Being tired of dealing with fire fighting problems, it is difficult to undertake the development of larger functional requirements.
-
The existing architecture is old, the modules are complex, and there is a chronic lack of governance. Module number and personnel size mismatch, small requirements may involve multiple module changes. There was a lot of stale code that had to be “patched” constantly.
Reflection: Defining problems and challenges
Faced with the current situation, the entire production and research team realized the need to change as soon as possible. What are the key issues behind the above phenomena? What are the challenges?
2.1 Definition Questions
By further analyzing the root causes of the problems, they can be divided into the following categories:
[Product level] The product direction and positioning are not clear, and the functional level and classification are not clear
-
The product evolution direction is unclear, the service domain is unclear, and the service main path of each module is unclear. Usually the development is stacking functions, resulting in many business scenarios have the use of experience points;
-
Due to historical reasons, the roles supported by the system are redundant and complex. There are existing platform-based roles, such as supporting direct communication between Baidu consultants and merchants. There are other roles on the B-side, such as supporting sales to view leads directly;
-
From the PC era to the mobile era, however, the product still retains some traces of historical compatibility. For example, common terms are classified by PC and mobile, and site style types can only be set to one end.
Example of an older client interface
[Architecture level] The client architecture has not evolved for many years, making functional iterations difficult to sustain
-
The client only supports Windows system and the architecture has not been evolving. The technology stack is based on C++, deviating from the main technology stack of the team, it can only be difficult to maintain, unable to undertake new functional requirements. There is an urgent need to evolve into a cross-platform, mainstream, front-end technology stack;
-
The front end of the visitor side has not achieved the separation of the front and back end architecture, which greatly reduces the experience and development efficiency.
[Architecture level] The basic communication layer of the server architecture needs to evolve
As a very important part of communication products, the communication protocol layer still has deficiencies in architecture:
-
The stability of multiple network connection protocols needs to be improved;
-
And the message sending performance of different ends needs to be improved.
[Architecture level] The business layer of the server architecture needs to evolve
The business layer contains 20+ service modules, and the main business logic is maintained in the form of shared library, resulting in unclear module boundaries, chaotic data links, and serious overlapping and coupling of functions. Therefore, it is urgent to evolve into the mainstream microservice architecture.
-
The responsibilities within modules are not cohesive enough, and the call relationship between modules is highly coupled.
-
The same data is stored in multiple copies, causing data consistency problems.
-
The synchronous asynchronous transmission link of data flows is corrupted.
[Architecture level] The overall server architecture has high self-maintenance cost and low maintainability
Legacy systems need to operate and maintain a variety of self-operating middleware, resulting in the team can not focus on business function development. It not only reduces r&d productivity, but also brings great challenges to system stability.
-
The reverse agent Nginx cluster, Zookeeper cluster, Storm cluster, Kafka cluster, Solr cluster and Prometheus cluster were self-operated and maintained.
-
There is still a big gap between the main server cluster of the department and the cloud native service governance architecture.
[Organizational level] The whole production and research team has insufficient and incomplete understanding of the business
-
The long-term decoupling of business architecture and R&D architecture leads to the team’s lack of knowledge in fields ranging from communication industry to a specific module, and it is urgent to consolidate existing knowledge at the production and research level.
-
On the basis of consensus reached by the team, a positive cycle of rapid evolution and rapid iteration of domain cognition can be formed in the future.
2.2 Recognize the challenges
With attribution clear, the direction of refactoring becomes clearer. However, the implementation stage will also face challenges such as business evolution pressure, weak original architecture foundation and resource shortage.
The architecture is old and there are many hidden “holes” in the code
In the past, a small change and seemingly assured rollout can cause problems for customers. On the one hand, there is a lot of lack of design in the code, on the other hand, the overall regression test coverage is not complete. The group refers to this state as “every line of code is just right,” no more, no less.
Refactoring and business evolution are both
This is a challenge that most teams face, and the business cannot stop evolving and wait for technology to refactor. How to carry out refactoring without affecting existing business and ensuring normal iteration of some high quality business requirements is a question that must be answered.
It can’t just be refactoring, it can be a better experience for the customer
Client architecture upgrades will inevitably bring new user experiences and require managing the expectations of existing users. The scope of this reconstruction is large, and the product quality is not only a requirement but also a challenge.
The production and research team is relatively new and lacks sufficient understanding of the original business functions
Business development teams rely heavily on domain experts for business knowledge guidance, responsibility and boundary demarcation between subdomains and modules, and data attribution needs to be based on business understanding. This is a big challenge for the existing team.
Therefore, grasping the main contradiction, step by step is the keynote of this reconstruction.
Bail-out: solve the problem
Refactoring only from the technical level can only solve the immediate technical problems. With rapid business iteration, the results of pure technical refactoring can easily disappear. Take into account the need to make changes to both business and technical level, on the basis of the existing complex business research still can maintain efficient production efficiency of delivery, and brother next door team in clues before butler products have reaped benefits of DDD transformation, so the technical reconstruction decision combined with DDD, from products to technology to cognitive upgrades, architecture, at a time.
3.1 Positioning: Determine product direction and core pain points
Product positioning and differential value
Product Positioning: Choosing “what not to do” is more important
-
Focus on the pre-sales reception scene, help merchants get contact information, do not do after-sales service scene;
-
Focus on the advertising marketing scene, help advertisers receive promotion traffic and optimize the effect;
-
Due to the ToB SaaS model, it temporarily focuses on the needs of enterprise customers, rather than the upper-level needs of enterprises as a platform.
Product Usage Roles: Who are our users?
- Focus on B side customer service roles. Other role-related functions are stripped, such as the business card function of following up clues is transferred to the clue butler module (sales role), and the feedback function is transferred to the oCPC feedback module (SEM role).
Differentiated value: Why do customers choose us?
-
Full-link closed-loop: seamless connection from promotion to visitor entry, dialogue, capital retention, to marking session feedback oCPC target;
-
Combined with clue housekeeper: intelligent recognition of the clue information in the conversation and message board, automatically precipitate to the clue housekeeper, effectively save the work of clue combing;
-
Intelligent marketing: intelligent analysis and identification of visitors’ intentions, thousands of words guide visitors to open their mouth to leave capital;
-
Multi-terminal sharing: Supports simultaneous use of Web, App and PC terminals to achieve communication anytime and anywhere.
3.2 Analysis: Identify core areas and modules and disassemble business logic
3.2.1 Event Storm: A good helper for profiling processes and aligned cognition
For the major business processes, the production and Research team combed the event stream in the form of an event storm, defining the roles, actions, rule conditions, and event outcomes associated with each event. The most important thing is to align the team’s business perception and analyze the overall business details with the collective intelligence.
3.2.2 Boundaries are the basis of cooperation: the division of domains and modules to form a unified language
According to product positioning and product value analysis, combined with the sorted business process, it is necessary to delimit molecular fields and allocate appropriate resources accordingly.
[Core domain]
-
It is natural for visitor domain and customer service domain to belong to the core domain. At the same time, as the basic capability of the bottom layer, the protocol connection domain including TCP, Websocket, HTTP, long polling protocol, protocol packet format, connection state maintenance and so on should also be the core domain. Secondly, the conversation domain is also the core domain. Only by sending messages to each other can we enter into real communication. The main purpose of communication is the expression of intention and retention of funds in the conversation content.
-
The core domain strategy is to focus resources around product value. Remove non-core functions from the core domain as much as possible, and be wary of investments that can cause the team to lose focus.
[Support domain]
-
Data analysis domain is a necessary function but not the focus at present. Cue domain is a necessary link for communication, but it should make more use of the ability of Aipanfan’s cue manager. Advertising domain includes visitor promotion information analysis, conversational effect feedback, is the core ability. But this is classified as the support area because the key capabilities are already provided in the search team, and the communication team does the data access and data supply work;
-
The strategy of the supporting domain is to build the necessary capabilities with as few resources as possible. Of course, with the development of the business, the support domain may become the core domain in the future.
【 General domain 】
-
Account permissions are a common feature on most systems. In the visitor scenario, which is a ToC scenario, black traffic attacks may occur, including the inbound visitors and messages sent by visitors. Risk control and anti-cheating capabilities must be introduced. Aipanfan communication mainly relies on the ability of Aipanfan strategy team and internal security department;
-
The strategy of the general domain is not to build the system personally as much as possible, but to complete the capacity building quickly with the help of external capabilities.
3.3 Architecture: Build the overall technical architecture
Architectural objectives and design points
-
Services are divided into multiple layers according to their responsibilities according to the traffic direction. Five layers, including user interface, access gateway, business front and back, and communication protocol connection, are constructed and maintained by the communication team. The basic services and storage layer at the bottom are mainly based on basic technical capabilities. Layered construction can define the service in different levels, the efficient use of team r&d resources, to undertake different traffic types (the actual user traffic user traffic, the background, an asynchronous call flow, timing, task flow, etc.), simplify the request of data link, according to different levels of construction non-functional requirements (choice of technology stack, current limiting fuse, flexibility, etc.).
-
Technical architecture matches business architecture. Service module boundaries conform to business boundaries. Core service needs to design domain model, build business logic around domain layer and application layer, build DDD four-layer layered architecture, achieve the separation of domain model and technical details, unstable implementation depends on stable implementation.
-
Conforms to a typical microservice architecture. Service responsibility cohesion, service and data integration. Data is private to the services, business logic is not shared between services, and services collaborate through apis or domain events.
-
Reasonable data architecture. Use data ultimate consistency policies whenever possible. It is not necessary for each type of data to be stored in more than one place, and the final consistency scheme shall ensure the multi-place storage. When noSQL storage such as Redis, HBase, and Elastic Search (ES) is involved, service data can be stored in different databases and tables as required to prevent uneven sharding caused by large keys.
3.4 Breakthrough: Key technology of architecture design
3.4.1 Implement the real microservice architecture
After the division of sub-fields and modules is determined, the corresponding module responsibilities and collaboration between modules need to be adjusted for transformation, and the key transformation points include:
Merging old modules
Before the transformation, there were 45+ service modules on the server. The service responsibilities were improperly divided and the service granularity was inappropriate. Specific performance is as follows:
-
Some functions are too fine-grained, which increases maintenance costs, and can be merged.
-
Some similar functions are scattered across multiple services. For example, all five modules provide information query for visitors, which can be combined.
-
With the upgrade of the old client, some services are more suitable to be merged into other services after function transformation. The original services can be offline.
-
Unreasonable division of responsibilities at the reverse proxy layer leads to too many service clusters, most of which can be migrated to corporate-level BFE clusters, and a few Nginx clusters containing a lot of Lua logic can be retained temporarily, but can be merged.
After merging offline transformation, the number of services reduced by 15+.
Split new modules
Some functions are so important that they need to be built as separate modules. Such as:
-
Visitor advertising information parsing service. Advertising information is very important for customer service to portray and understand visitors. However, the previous parsing logic was scattered in multiple modules and the implementation was not unified, resulting in low parsing accuracy and insufficient compensation strategy to ensure the necessary parsing success rate.
-
Intelligent robot reply service. This is also a differentiated value of product positioning. In order to make customer service more efficient to receive visitors and guide visitors to leave more money, this area of product evolution and complexity has increased.
-
Clue service. The clue service here is the boundary of Aipanpan’s communication and clue manager products. It mainly extracts the contact information for the conversation content or message content, and then transfers it to the clue manager through interfaces or events. At the same time, it also forms the closed-loop data of consultation to the clue.
Modules do not share business logic
The back-end business services before the transformation are not real microservices. Although they are all deployed independently and their interfaces are exposed, the coupling of service implementation layer is serious:
-
Business logic is shared through a common library, that is, a JAR package for Java. The same piece of business code is dependent on multiple business services, which reduces both the maintainability of the code and the testability of the service.
-
Data is passed through a cache (Redis). A Redis key often has multiple services both writing and reading.
-
Data tables belonging to other service responsibilities are read directly through DB sharing of data.
Principle of transformation: do not share common libraries including business logic, have microservices split vertically, keep relevant business data (including cached data) private to the service, provide capabilities through API interfaces, or drive downstream processes through domain events.
High availability with ultimate consistency
A key tool for availability is data replication. Different data synchronization methods and storage types can be used to implement high availability in various service scenarios. Common data replication and synchronization methods are as follows:
-
Publish/subscribe: upstream services use message queues to send relevant data as a message carrier, and downstream services subscribe to the message and make the corresponding persistence. This approach is used extensively throughout the communication server and is a great tool for service decoupling.
-
CDC mode (Change Data Capture) : Simply put, it detects Data changes (including additions, updates and deletions) of upstream services by listening to MySQL binlog, parses logs and performs some processing (such as associative table query), and then sends them to message queues for downstream subscription processing as required.
The CDC mode and publish and subscribe mode can be used together in many scenarios to separate read and write services and select heterogeneous storage media. For example, visitor inbound records are written into MySQL and visitor history records are queried into ES, while sessions are written into Table and session analysis service is queried into Doris. It can effectively meet the data access requirements of each scene and improve the availability of the scene.
Of course, this kind of availability often sacrifices data consistency within a certain timeliness, which requires trade-offs based on actual business scenarios. The rule of thumb is that between getting an immediate answer and getting the right answer, most people actually want an immediate answer.
3.4.2 Data Link Governance
Before the transformation, the data flow of main scenarios including inbound, outbound, automatic reply, session content verification, clue recognition, session termination and so on must pass through the real-time computing service, whose core implementation is Storm. However, due to a variety of reasons, this cluster is very unstable, which will cause a large number of customer problems mentioned above. Deep analysis of the current situation mainly has the following disadvantages:
-
Storm topology design is not reasonable and the responsibilities of topological nodes are unclear;
-
Topology nodes have a lot of business logic, redis is commonly used to transfer data, redis key design chaos, poor maintainability;
-
– Storm cluster was introduced a few years ago, the version is low and has not been updated.
After analyzing the business requirements, it is found that only upgrading storm cluster version will not solve the practical problems, and real-time computing framework is not necessary at this stage. Therefore, the following ideas are proposed:
-
Remove the centralized computing cluster and sort out data flows according to service scenarios to avoid mutual interference. Let the corresponding business service module undertake the business logic. If the business response needs to be improved, it can be accelerated through the cache cluster.
-
As far as possible, data is transmitted between service modules in asynchronous mode (Kafka message queue). At present, message queue can also achieve near-real-time effect, and enhance the disaster recovery function and subscription monitoring of message queue.
-
Delay scenarios such as automatic reply when visitors do not speak for a period of time can be solved by the solution of delayed tasks.
-
Redis key is reorganized to optimize large keys (a key carries a particularly large content, for example, a key contains part of the information of visitors in the whole system, such a key design is obviously too large), and try not to operate Redis directly across service modules.
The soul of a business application is data, and technical architecture takes time to consider all aspects of data storage and reading. For example, what storage system to use (storage system can not be the fastest read and write, there are trade-offs), when to use caching, what the data transfer link of the entire business process should be, and there are many write amplification versus read amplification trade-offs involved in communicating the system. This reconstruction also involves the sorting and transformation of these aspects, which will not be introduced here.
3.4.3 Optimization of communication protocol
Why protocol optimization?
In view of the problems mentioned in Chapter 1.2, such as frequent loss of visitors and messages not appearing on the screen on the client, simple patching is difficult to solve the problems completely, so it is necessary to carry out a thorough transformation and optimization from the protocol layer. Detailed pain points are as follows:
-
The existing protocol lacks robustness and has hidden dangers from the protocol level. An event (such as inbound, establishment of communication, outbound) requires multiple packages to complete the interaction. If a visitor operates frequently, the visitor status will also change frequently, which is prone to errors.
-
In rich client mode, too much status information is maintained on the terminal, and the sequence of push packets is excessively dependent. In addition, there is no fault tolerance and self-recovery mechanism, which is prone to problems such as visitors not being displayed and messages not being displayed on the screen.
How to optimize?
-
The notification module uses distributed lock to control concurrency and adds SeqId to the packets to confirm the morning and evening order, which provides a judgment basis for the client.
-
The status protocol is optimized to simplify the action notification packets and adopt the packets with visitor status as the main function, as shown in the following figure. The action packets are simplified and only the status packets are kept. The number of packets is reduced by about 60%, which reduces the processing complexity of the client and reduces the error probability.
-
On the client side, the socket long connection mode is changed to HTTP + socket push and pull mode. When the network is disconnected and reconnected, or packets are lost or confused, the client proactively pulls the latest status to completely solve the problems such as incorrect visitor status and message failure.
You may ask:
1. As mentioned above, distributed locks control concurrency. Will lock contention increase request processing time?
A: The lock granularity is the granularity of a single visitor. The granularity is small enough, and the lock competition will only occur when the same visitor performs fast operations (such as frequently and quickly opening pages and initiating communication). For a single visitor, the normal operation is not concurrent.
2, since the benefits of protocol optimization are so, why not do protocol optimization earlier?
Answer: Due to the unclear division of business boundary, the change of visitor status was scattered in the front desk, back desk and many places in the original Storm cluster, so unified control was not possible. Only after the completion of the preliminary construction optimization and the completion of data link governance, the protocol optimization can be carried out based on the original work results.
3. Why not combine push and pull of the client earlier?
A: as mentioned in article 2 of 2.1 above, the client stack is based on C++ and can only be difficult to maintain, unable to undertake new functional requirements. This makes it extremely difficult to change the client protocol, which is a big reason for the client architecture upgrade in Section 3.5 below.
summary
-
DDD transformation of visitor, customer service and session management modules.
-
The anemia model was changed to hemorich model, and the state change was controlled by state machine.
-
The client requests are mainly HTTP, and the return value is synchronized to reduce the error probability. The socket is used to send notifications to the end.
-
Protocol packet simplification, interaction in the dimension of visitor status, greatly reducing the number of packets.
3.4.4 Removing the O&M Middleware
As mentioned above, due to the historical technology stack, Several kinds of middleware have been operated and maintained in the aipanfan communication team. Let alone the correctness of introducing these middleware, the current situation is that there is not enough knowledge reserve, which not only brings many unstable factors to the system, but also reduces the team’s research and development efficiency. Therefore, the principle of this reconstruction in this aspect is to give priority to the unnecessary middleware in the offline architecture, and the necessary middleware will not be maintained separately and will be transferred to the basic technical team of the department for operation and maintenance.
Cluster Transformation Offline
-
Zookeeper cluster: Before the transformation, it is mainly used as a service configuration center and migrated to k8S’s friendlier ConfigMap (operated and maintained by the basic technical team).
-
Nginx cluster: Before the transformation, there were several reverse proxy clusters with both routing and forwarding logic and business logic. The business logic sinks to the corresponding Gateway service and is maintained by the team. Routing and forwarding logical migration to BFE cluster, unified operation and maintenance by basic technical team;
-
Storm cluster: Logic modification, offline. The details are given above;
-
Solr cluster: Go offline, and the corresponding query logic is modified and migrated to the ES cluster.
Cluster migration
Although these clusters cannot be offline, the team does not maintain them separately and migrates them to department clusters. Includes the Kafka and Prometheus clusters.
3.5 Extension: Client architecture practices
3.5.1 Client cross-platform Architecture
As the maintenance cost of the original client becomes higher and higher, the cross-platform Electron framework is selected based on the client’s demands for MAC.
Why did you choose Electron?
-
Open source core extensions are easier.
-
The interface is highly customizable, and in principle it can do anything the Web can do.
-
Is currently the cheapest cross-platform technology solution, HTML + JS technology reserve, and has a large number of existing UI libraries.
-
More stable and less buggy than other cross-platform solutions such as QT GTK+, as long as the browser is up and running, there won’t be too many problems.
-
Easy to expand, can be directly embedded in existing Web pages.
Electron system architecture
The technology stack of The front end team of Aifanfan is Vue, so we choose to use Electron-Vue to build the project. Electron has two processes, main and renderer. The main process contains the client automatic update, plug-in core, system API and so on. The renderer process is the architecture of Vue + Webpack, and the two processes communicate via IPC.
Aipanfan client is mainly IM business, so the communication uses Websocket for message notification, because the customer service to send messages contains style Settings, so the transmission content contains rich text, so it is easy to cause some XSS problems. We use XSS whitelist to filter XSS attacks, and all content is filtered by policy to block bad text such as yellow and reverse text.
Aipanfan communication considers that it can access more business categories flexibly in the future and supports the independent development of personalized functions by third parties. At the same time, we need to take into account the stability and ease of use of the platform code, we adopted a plug-in architecture to implement the client.
Problems encountered during development
While Electron brings great convenience, it also has many wounds of its own. Often teased by people, such as high memory footprint, and native client performance differences, API system compatibility problems. These issues need to be considered in advance during development. The following are some of the problems that are bound to occur during development.
1. Performance optimization
Performance optimization is often considered after the required functionality is developed. In Electron, the best analysis tool is Chrome Developer Tools Performance, with the fire map, any problems with JS execution can be seen visually.
2. Blank screen in Windows 7 system
Because QA students have been using Win10 system during the test, the problem of white screen has not been found. Until the official launch of the client, the white screen problem was centralized feedback, so far we began to pay attention to the white screen problem and actively solve it.
Since the electron version we use is 9.x, GPU acceleration is enabled by default in this version, but enabling GPU acceleration in Windows 7 requires an administrator’s permission. If you do not have the administrator’s permission, the process will be stuck, resulting in a blank screen on the home page. Therefore, the solution to this problem can be solved from two aspects, the first is to enable administrator rights, the second is to disable GPU acceleration. Considering the client USES most of the people is the customer service, the company computer configuration is low, and generally no administrator account privileges, so we choose by closing the GPU acceleration (app) disableHardwareAcceleration ()) to solve the problem.
3. Other questions
There are some common problems to pay attention to during Electron development. For example, there are coding problems in reading and writing files, client security problems such as RCE, arbitrary command execution, and high memory usage.
3.5.2 Microkernel/plug-in architecture
What is plug-in architecture
Plug-in architecture means that the software itself only provides pluginCore and pluginAPI for the plug-in runtime. After the plug-in is downloaded from the plug-in platform, it can run perfectly on the software.
The most basic example is Webpack. As a mainstream construction tool, Webpack only abstractions a software runtime environment and can independently develop new plug-ins to enrich the capability of the whole system without caring about and changing the existing code of the system.
PluginCore: Plug-in runtime core; PluginAPI: Provides an access interface for plug-in execution; Plugin: A plug-in that implements a specific function.
Advantages of plug-in architecture
Plug-in architecture is a best practice of the open close principle across the system level. In the case that the plug-in core and interface remain unchanged, the system can continuously add new plug-ins to enrich the system functions. In a non-plug-in system, as the number of functional modules increases and the amount of code increases, it becomes increasingly difficult and inefficient to introduce new features and fix bugs. However, no matter how complex the existing system functions, the complexity of developing new functions remains the same. And with the platform of the system, the differentiation function of third-party access will not affect the stability of the system.
Aifanfan plug-in status quo
In order to meet the customization requirements of other third-party platforms, such as the commodity and order module of e-commerce platform, the customer module of CRM platform, and the evaluation module of after-sales scene, the design points of the plug-in architecture of Aipanfan client are as follows:
-
Plug-in architecture solution
-
Two access modes are provided: JS-SDK access and Webview embedding.
-
There are two communication mechanisms between the third-party plug-in and aipanpan client: event broadcast and instance injection.
-
Fanfan client plug-in categories: left menu plug-in, session toolbar plug-in, session sidebar plug-in.
-
Plug-in configuration file description:
{" version ":" 0.0.1 ", / / version number "id" : "demo - name", "name" : / / bind event id "component name", / / the plugin name "viewUrl" : "", // menuList -- menu plugins, toolbarList -- communication area plugins, infoList -- right toolbar plugins "dependent": {"method": [], "version":"1.0.6" // Dependent client version}}Copy the code
Four, joy: solve the effect
4.1 Upgrading the Product Architecture
New client design principles:
-
According to DDD principle, define menu module and abstract function hierarchy;
-
Compared with the old version, the structure is clearer and the function is more expansible.
-
The container changes and redefines areas that release core session functionality;
-
Integration of three terminals (Mac + Win + Web), sharing a set of product design, flexible and convenient operation.
4.2 Improving customer experience
After the migration, we made a return visit to the customers using the new client. In addition to the feedback on the requirements, we also received some affirmations:
4.3 Production and research efficiency has been greatly improved
Technology for product services, production and research together to create business value. Production and research efficiency is the primary goal of technology reconfiguration. Effectiveness can be measured in two ways.
Overall delivery speed of requirements
- The essence of agile iteration, for example, is not the efficiency of a single point of delivery, but the overall efficiency of finding requirements and bringing them online. This is the biggest value that DDD brings to this technology refactoring. Through the analysis, design and implementation of requirements and business, the whole team of product, design and RESEARCH and development can improve the running-in and understanding of business to a new height, assisted by reasonable technical architecture, and can improve the overall efficiency of demand delivery.
Technology r&d efficiency
-
The direct result is fewer people supporting a larger product range. There were 12 people in technical research and development before, now there are 7 people;
-
Indirectly, the cost of code maintenance is greatly reduced, the number of service modules is coordinated with the proportion of team members, the module responsibility and collaboration relationship is clear, the interface design quality is high, the code specification is high, and the newcomers get used to it quickly.
4.4 Production and research efficiency has been greatly improved
4.4.1 System Stability
It is directly reflected in the stability problems of the high-frequency technology mentioned above, such as the timely identification of visitors to the station and the automatic reply that is not triggered, which have been comprehensively managed. The stability index of each system module is maintained at 99.99% for a long time.
4.4.2 Maintainability
Code maintenance costs are greatly reduced and the architecture is more maintainable at different levels:
-
Coordinate the number of service modules and the number of team members;
-
Module responsibilities and collaboration are clear;
-
Clear service data flow link;
-
Project code structure standard, easy to understand. New people get started quickly;
-
Interface documents online.
4.4.3 Evolvability
There are many potential evolution directions of Aipanfan communication system, and some aspects have been designed and reserved, such as:
-
More communication formats: decoupled from the service system, it is easy to add communication content formats such as video and voice.
-
More connection forms: Currently, it supports push and pull protocols including HTTP, TCP, Websocket, long polling and so on, almost meeting most scenarios.
-
; Access of more service types: Basic communication capabilities Open capabilities, low-cost access through APIS
-
Continuous evolution of communication functions: such as more intelligence, more seamless integration with lead managers, stronger risk control capabilities, these requirements can be built on demand corresponding business modules, independent evolution.
Five, growth: experience summary
Through this refactoring, the team experienced a painful process from dilemma to reflection, and correspondingly gained organizational, technical, and human growth.
organization
-
The production and research team focuses on creating business value and carries out daily work from the perspective of solving customer problems;
-
More efficient production-research collaboration, based on unified language communication needs and design;
-
Domain knowledge is deposited during business iterations.
technology
-
The answer to technical problems often comes from business, and understanding business is the premise of developing technology. Different businesses bring different technical demands, the best technology is the best, but also advanced;
-
The restructured architecture can be adapted to current business development, and r&d can focus most of its energy on business implementation, shielding much of the noise of daily development.
people
-
Through this reconfiguration, we have improved the all-round familiarity of each member with the communication business. Not only the overall picture of its own business, but also the evolution status of industry friends, alignment of future evolution direction;
-
On the basis of understanding the ins and outs and the big picture of the technology architecture, let each business r&d focus on building its own responsible module. Through DDD practice, we can improve our application architecture level, provide a new direction for technology advancement, and give play to the subjective initiative of the module leader.
Six, stars: future prospects
At present, Aipanpan Communication has been repositioned with a more focused direction, but at the same time, it also faces many directional choices. For example, in the face of different upstream scenarios and different promotion platforms, whether the subsequent access capacity needs to be stronger? The strategy model of intelligent robot in some scenarios does not keep continuous iteration update, whether it needs to be further intelligent.
The planning of the technical architecture should focus on service requirements first. Besides, it will continue to evolve to cloud native, adding capacity assessment, full-link pressure measurement, and traffic management capabilities. For example, the recent plan is to upgrade the underlying base from K8S-style micro-service governance to service grid, and align the main cluster capabilities of Aipanfan, so as to better reuse the capabilities of the basic technology platform in the future. At the same time, it further reduces the cost of unified service governance in multiple development languages (golang is the service of access layer and protocol connection layer, Java is the business service).
In the future, how to achieve “good, but also different” aipanfan communication production and research team still has a long way to go.
7. Introduction to the author
This paper was written by several students in the production and research team.
-
Fei Xie: Architect, good at implementing complex systems through microservices architecture and DDD;
-
Nuts: Product manager, good at ToB SaaS and advertising products;
-
Ning Ning: a product manager who has an indissoluble bond with business bridge and online communication;
-
Wheat: Veteran front end engineer, struggling in the front end field of light speed evolution.
-
Flyme: Senior R&D engineer, adept at improving technical solutions to cope with complex and changing business scenarios.
Eight, the past selection
Interface documents automatically change? Baidu programmer development efficiency MAX secret
Tech reveal! Baidu search medium low code exploration and practice
Baidu intelligent cloud combat – static file CDN acceleration
Simplify the complex – Baidu intelligent small program master data architecture practice summary
Baidu search in Taiwan mass data management cloud native and intelligent practice
Baidu search “mixed” join information, how to rely on AI to solve?
———- END ———-
Baidu said Geek
Baidu official technology public number online!
Technical dry goods, industry information, online salon, industry conference
Recruitment information · Internal push information · technical books · Baidu surrounding
Welcome to your attention