Author: Concave-Man-Manjiz
What is Atom? Atom is a one-stop platform for professional intelligent pages and small program design services with various experienced e-commerce designers in the industry. After two years of compact iteration, the project is getting bigger and bigger, the requirements are constantly changing and optimizing, the internal logic is complicated, and the maintenance cost is sharply rising. At the same time, Atom will be carrying more and more business, providing services to more internal users and merchants. To accommodate these changes, it is imperative to upgrade the architecture. We will deconstruct the server module to make the service lightweight and modular, and expand the business scenario more easily.
The Atom server has gone through three iterations, and this article focuses on the third.
The architecture of 1.0
This is the oldest version of Atom. In this version, only channel page functionality is planned to free developers from the complexity of channel page development. Because of the pure purpose of the function, the system complexity is low. All the code runs in one process.
On the deployment side, it’s a very primitive manual operation: developers log in, pull code, do a local-like setup startup, and then repeat the process on different machines.
In addition, the old version of Quark uses a named component, which to some extent limits the scalability of Quark itself, so it is not expanded here.
The architecture of 2.0
From channel page up platform to multiple scene page up platform, Atom took less than a year, a richer components, more templates, more scenes, more designers involved, the more users, specialized product development gradually, simple manual operations are no longer applicable, so all the front end and the service side have made a big change, The server side was reconfigured with Salak, a very handy server side framework that also brought us automatic generation of interface documents, and both the front end and the server side relied on Talos (a container deployment internal platform) for deployment. The service side is gradually entering the industrial age.
However, the extensive development mode and the lack of macro planning were still not solved at this stage, which increasingly exposed the following problems:
-
Highly concentrated
More than 90% of the services are concentrated in a single architecture. As the business becomes more and more complex, the amount of code becomes larger and larger, and the readability, maintainability and scalability of the code decrease, the access cost of developers increases sharply, the cost of business expansion increases exponentially, and the continuous delivery ability is difficult to maintain. As the number of users increases, the concurrency of applications becomes higher and higher, and the concurrency capability of single architecture applications is limited. Due to the increase of system complexity, the test is becoming more and more difficult.
-
High coupling
Each module in a single unit depends on each other, affects each other, and restrashes each other, resulting in low code reuse. New function development is often due to the fear of hidden eggs in the coupling logic, and choose to rewrite, which is not what we want to see!
-
The logical confusion
In addition to the logic confusion caused by coupling, Atom, as a platform that grew from scratch, has a long history of requirements, some of which are no longer used, some of which are barely used, and this code logic poses a huge challenge to developers who are afraid to change the code while maintaining it. In addition, the need for downward compatibility in iterations creates a heavy historical burden on the server.
-
Code redundancy
Because the framework did not define the standard standards in the early stage, code verification was strictly observed in the development process, and the logic and constants of the code were repeatedly defined, which also made the project difficult to maintain. For example, to modify a constant requires multiple modifications at the same time on the premise of ensuring no omissions.
New Architecture Objectives
According to the advantages and disadvantages of the original architecture, we set the objectives of this architecture upgrade:
- Service modularization
- Service generalization
- Plug – in site
- Plug and remove scenario
- Standards and Specifications
Noun explanation:
- Site: Decoupling the server from the platform, from being a service as a platform, to being able to provide the same service for multiple platforms that are isolated from each other.
- Scenarios: Concepts set for different service types. Different scenarios have different management modes and processes.
The overall architecture
The overall architecture is divided into four parts: Web application layer, interface layer, service layer and data layer. In this way, entrance can be unified. Single point deployment in deployment makes release more convenient, and independent deployment reduces the impact on the overall service:
- Web application layer: Includes Atom platform and other platform applications
- Interface layer: provides gateway services. Requests of the application layer are controlled and forwarded through the gateway
- Service layer:
- Service communication: Asynchronous communication uses MQ and RPC communication uses HTTP
- Business module: core code, dismantling many small module applications
- Basic services: unified control of users and permissions
- Service management: improve the stability, robustness and flexibility of services
- Data layer: Core data store
The gateway, as the traffic entrance of the whole server, processes all traffic, intercepts illegal requests, analyzes the login state and transmits it to the downstream, verifies interface permissions and timeout responses, etc., so as to uniformly control and reduce the pressure on the downstream.
The implementation of
Plan/prepare/evaluate
Before formally enter the upgrade development, the group through the meeting to discuss the necessity and feasibility of structure upgrade, prompted us to upgrade platform is the direct cause of the new site requirements and scenarios, such as to realize the demand on the original structure, the will on the already chaotic logic logic, add more coupling and indirect reason, i.e. upgrade necessity, It is to make the system modular, standardized, universal, make the logic of the system more clear, improve the maintainability of the whole system.
After repeated discussions, the original system was segmented according to functions, and further segmented according to generality on the basis of functions. Additional supporting work of the new architecture was added, and the workload and estimated time of these work were evaluated. Finally, tasks were assigned.
The implementation of
modular
Why modularity? As the platform grows larger, we want to make the functions of each part more independent, clear and clear, minimize the influence of each part, and operate and maintain each part separately to avoid the situation of affecting all parts at the same time.
The upgrade divided the project into 10+ modules based on functionality and versatility: modules for compilation, modules for template management, modules for scheduled tasks, gateways for entry, and so on.
There are several generic services that are separated from the Atom system to provide services for Atom as well as other systems.
The most troublesome part of module disassembly is cutting off the associated logic. Module stripping and repair will inevitably lead to a problem — the same code appears repeatedly in different modules. To solve this problem, we put some of this code in the tool NPM package: constants, TypeScript type definitions, permission mappings, Mongoose Schema definitions, Salak plug-ins, tool methods, and more.
Another question, in the original architecture, modules can be called directly by code, so how to “restore” this function in the new architecture? To ensure decoupling, only a few of the just-in-time functions in the new architecture are called directly from module to module via interfaces, while the rest are communicated to the database via MQ message queues.
For MQ communication, here’s an example: compile. It usually takes a long time to compile the server, which will affect the performance of the service. Moreover, the compilation result does not require synchronous response. For the compiled module, if everything comes, the service will be under great pressure, so we decided to use message queue to complete the communication between modules:
- The project module invokes the publishing module directly through the interface to initiate the publishing operation.
- The publishing module pushes an “I want to compile” message to the message pool;
- After receiving the message, the compilation module determines whether it can enter the compilation based on its own situation. Otherwise, it will not respond at first.
- The compiled states are also pushed through the message;
- Finally, the project module does various processing after receiving the compile status message.
generalized
As mentioned earlier in the modularization effort, we removed four generic service modules, which are independent of the Atom system and can serve Atom as well as other systems. The universalization of modules is based on two considerations:
- Enrich the services of the department and reduce the repetitive development of functions
- Eliminate Atom non-core code and make the system leaner
The accompanying question is worth thinking about, how do we consider whether a feature is worth pulling out of generalization? We should try to avoid falling into the trap that system modularization is about making the system as small as possible. If split too fine, is bound to increase the workload of operation and maintenance. When splitting modules, we consider whether the functions within a module are complete and independent, and how much the department or company needs this common service, so as to truly achieve low coupling and high cohesion.
standardized
At the code level, here’s a simple comparison:
Compare the item | The old architecture | The new architecture |
---|---|---|
The main language | JavaScript | TypeScript |
Code detection | Failing to keep a | necessary |
The name of the interface | Playful tricks | A unified form |
Output interface | flowers | A unified form |
TypeScript is great, as everyone on the front end knows, because it gives us auto-completion, an optional type system, the ability to use newer JavaScript features, and so on. See why TypeScript for more. What are the reasons for the last three? The old architecture went from zero to one. The project was not planned in the beginning and there was not enough time to fix the system in the middle and later stages. The double effect of time and requirement changes resulted in code silting.
To this end, in the development of the new architecture, we emphasize the standardization of the code, the code inspection for each submission, and then the unification of the various interfaces:
- Interface path unification: In the old architecture, the path of a list interface might be
/xxx/list
Or it could be/xxx/xxxes
And so on, we in the new architecture based on RESTful API rules, with the resource noun composition of the path and semantic HTTP protocol unified interface definition; - Parameter name uniformity: for example, the number of pages per list entry parameter may be called
pageSize
May also be calledcount
“, so we unified it into a single name and required that this convention be followed in development; - Unified output: process and filter the data before it is output to the front end
_id
和__v
The output form of irrelevant data is also unified, requiring all the _id in the output to be replaced by the name of ID, etc.
The benefit of code standardization is that the code is more maintainable, developers can quickly locate the corresponding interface code, and there is less interface recognition memory for the front end.
Plug – in site
As mentioned earlier, the immediate cause of this architectural upgrade is site requirements and scenario requirements. Iterating over site requirements under the old architecture only adds further coupling. To this end, we added a site management module, adding site fields to almost all data items, and bringing site parameters to almost all database queries. As a result of these efforts, adding a site now only requires adding a site through the site module and doing some initial configuration.
In addition to placing greater demands on Atom functionality, the site concept also poses new challenges to the original permissions architecture. In the pre-upgrade version, there is only one set of user permissions. To achieve different permissions for each site, there are only two perspectives:
- Permission meaning Split (provide a separate set of permissions for each site)
- Add a layer of abstraction to user permissions (users’ permissions change to multiple collections, switching between sites)
After comparing the two modification forms, the split permission meaning is relatively easy to understand and the code does not change much. However, it greatly improves the difficulty of maintaining the permission table. It is equivalent to adding a set of permissions to a new scene, which cannot be pluggable. Finally, the logic of switching permission sets based on user access to the site is added in the gateway layer.
Plug and remove scenario
Scene is a latitude below the site, existing activities, channels, psychological tests, SNS, shop several scenes, if a new scene under the old architecture, need to schedule development, and I am afraid there will be a lot of if-else for different scenes in the code. In order to expand and maintain the scene more easily and easily, we disassemble the code related to the scene from the perspective of resource management.
Each ATOM scenario has four types of resources: templates, items, tags, and permissions:
Template tag page | | -- -- -- -- -- - > project permissionsCopy the code
Firstly, the structure of the project module directory is introduced. The code of the project module is organized based on the policy mode. The business logic of each scenario is split into a separate file, which is directly called by the scheduler to avoid logical doping between different scenarios.
- The scheduler file is named
Base_ resources _service
- The scenario policy file is named
Scenario Lowercase _ resource _service
- The common policy file is named
Common_ resources _service
When the user queries, the scheduler directly invokes the method in the corresponding policy file according to the query conditions (generally, it is not allowed to directly invoke the policy of the specified scenario unless it is confirmed that it will not be associated with the data of other scenarios). When the scheduler does not find the policy of the corresponding scenario, it invokes common_service logic by default. Therefore, each scenario needs to inherit common_service. Take page management service as an example. The scheduler is base_page_service in the SRC /service/page directory. The common logic is common_page_service, and the channel page scenario logic is ch_page_service.
For a unified abstraction of the common methods in the scenario, the common CRUD method interfaces in the service are placed inAbstractServiceClass
In the file
├─ SRC │ ├─ ├─ │ ├ _ │ ├ _ │ ├ _ │ ├ _ │ ├ _ │ ├ _ │ ├ _ │ ├ _ │ ├ _ │ ├ _ │ ├ _ │ ├ _ │ ├ _ │ │ ├ ─ └ _{scene}_{resource}_service ├ ─ └ _{scene}_{resource}_serviceCopy the code
The deployment of
Data migration
In view of the great changes in this upgrade, the switch between the old and new versions must be careful, in addition to the front-end and server side to do a lot of joint adjustment, we also carried out compatibility migration of data, the main approach is to use migration script to do multiple processing of the old data according to the needs of the new architecture, and then write into the new database.
Uninterrupted deployment
In a singleton architecture, every release and deployment of a service creates a window of several minutes.
In order to avoid this kind of situation, in a production environment, we ensure that each module has at least two containers, at the time of deployment, the part of the container from the load balancing was removed, and loop check whether there is any vessel traffic, until there is no flow in update operation, the service starts to add to the load balancing, and then to the rest of the container for the same operation, The advantage of this is that the service is not interrupted throughout the deployment process, and the gap situation during the deployment is avoided.
operations
In order to avoid repeating the bad operation and maintenance experience and project code management under the old architecture, we organized an operation and maintenance document for the new architecture, including the details of rapid access, development, debugging and deployment as detailed as possible.
Added monitoring to the system to monitor the performance and availability of each interface.
The effect
After this upgrade, the planned effect was basically achieved:
- Clarity: logic combing, redundancy removal, TS refactoring, ESNext
- Modularization: decoupled 10+ modules, independent operation; HTTP, MQ, data layer and other multiple communication modes
- Standardization: Strong code specification; Interface unification; In response to a unified
- Universalization: 4+ universal modules, platform independent; Extract common libraries, configurations, plug-ins, middleware, and so on
- Easy migration: one-key initialization; One-click, single-point, independent deployment; Unified entrance
- Easy to expand: + new site expansion ability; Adjust scene expansion; Save 95% labor time cost
- Easy maintenance: add logs; One-click deployment; Uninterrupted deployment
- Easy to connect: complete Joi documentation; Detailed record of interface changes; Be as upward compatible as possible
Tools/methods/collaborations
Tools are very important to the smooth running of the project, so in this upgrade, we tried a variety of tools.
In order to ensure that project members have a clear understanding of the modules they are responsible for and a clear pattern of module transformation, the team introduced flowchart tools to sort out the modules of the old architecture and divide the labor, and to sort out and sketch the internal logic of each module of the new architecture.
In scheduling, we practice to use the gantt chart, use gantt chart according to the module to split tasks, and then assigned to the corresponding head and set the plan in time, synchronous overall schedule, every day from the gantt chart can clearly understand the resource allocation and scheduling of the project, also can see the project plan and the actual control, help the progress to control the whole project.
The gantt chart provides a preliminary division of the tasks for the project upgrade. For a more detailed division, we put IssueBoard. IssueBoard is like a simplified version of the task kanban, but it is more than adequate for us. It supports linkage with Git commits and is suitable for developers to close the corresponding Issue with each commit.
Conclusion reflection
In this upgrade process, also exposed some shortcomings, mainly reflected in the schedule and expectations as well as in the early communication.
-
Scheduling and expectations
The schedule was too optimistic in the early stages of the upgrade planning, and was not corrected during the upgrade process. Of course, this was due to objective reasons. The team had to complete the upgrade within the limited demand window to avoid maintaining two versions at the same time, which resulted in the team having to spend more time per day than planned.
-
communication
When the server is upgraded, no specific details are communicated with the front end, and the upgrade is not fully backward compatible, so it causes some confusion and inconvenience to the front end during the joint adjustment.
reference
- Atom:ling.jd.com/atom
- Salak: salakjs. Making. IO/docs/docs/z…
- RESTful API:www.ruanyifeng.com/blog/2014/0…
IO /notes/2020/…
Welcome to the bump Lab blog: AOtu.io
Or pay attention to the bump Laboratory public account (AOTULabs), push the article from time to time: