Author: Gu Dongyu (Network Development Manager of Scholarship)
This article is original, please indicate the author and source
preface
Within the company to solve the back-end and front-end for error code in the process of the debate, with some of the ordinate friend also had some deep communication, find it right or wrong error of cognitive and understand there is a big difference, so decided to write this article to describe my right and wrong, some error of understanding and right and wrong, the advice of the error handling.
What is an error code
Before we start talking about error code handling, let’s rethink what error code is. A private error code can be defined as a set of numbers (or a combination of letters and numbers) that are associated with an error message and used to identify various anomalies occurring in the system.
So what do error codes do for us?
-
First of all, we can tell by error codes what’s wrong with the system?
-
Second, we should be able to identify which system is having a problem with the error code.
-
Finally, with error codes we can decide what problem should we show the customer?
In order to achieve the above purposes, we need to make certain constraints on the naming and use of error codes. In the API specification of Hujiang, the corresponding error code area is assigned to each production line, which is a very meaningful decision. In fact, not only do different production lines need to divide their error code areas, but the system within each production line should also divide its own interval, which can effectively avoid the occurrence of repeated error codes.
For example, my production line was assigned an error code range of -0x1400000 to -0x14FFFFF, which in decimal form is an 8-bit number of -20,971,520 to -22,020,095. Therefore, I made some planning according to the error code range assigned. The left four bits 2098 to 2201 are available system codes for the production line. The remaining four bits on the right are allocated according to certain specifications. For example, 0000-3999 indicates a service exception, 9000-9999 indicates a system-level exception, and the remaining range is used for subsequent expansion.
I have to mention here how to apply for system code for a new system. In my team, the system leader needs to apply for the following information in advance before the new system is created, such as:
-
The HTTP port
-
Dubbo port
-
System code
-
JMX port
This part of the information will be divided in advance of the available range, and any port is prohibited to repeat, finally, the responsible person of each system to take the initiative to register some good port/system code, such as (port 8888). In this way, system failures caused by repeated ports and error codes can be avoided during the subsequent system deployment.
Error code handling
In a distributed microservice architecture, the transaction link depth involved in completing a transaction may be as long as 7 to 8 microservices. Each microservice has its own unique error code. After the remote call is completed, if the called party throws a failed response and gives an error code, as the caller, it should be based on bottom-up processing, throwing the error code and context up, and finally processed by the top layer of the request.
In fact, this is easy to understand. If we narrow our vision to a single architecture, a system will be divided into multiple layers, such as Controller, Service and DAO layers. Each layer may throw exceptions, but ultimately the way of catching and handling exceptions. This is usually done uniformly at the entry level. The benefits of this are:
-
One is to obtain more comprehensive stack information;
-
Second, the exception handling mechanism can be encapsulated in the framework layer.
-
Third, it can greatly increase the simplicity of the code.
Anyone with a clean code obsession will crash when they see a screen full of try&catch.
Error code and httpCode relationship
In Hujiang, systems communicate with each other using RSETful style HTTP protocol, so the relationship between httpCode and error code is restricted to some extent in the HUjiang API specification. But I have a different view on this point.
We all know the benefits of interface programming, and we won’t repeat them here, so similarly, I believe that we can also agree with the remote call interface, the biggest benefit of this is that the communication protocol can be smooth switching. So, if we convert HTTP protocol to Dubbo protocol, how do we abstract httpCode out? Therefore, httpCode should be separated from the error code, and the error code should be abstracted as a common field in the packet body.
Project background
There are always differences between the front end and back end on the handling of error codes. In some teams, the front end directly displays the error code description returned by the back end, some teams perform error code to description conversion in the middle stage, and some teams maintain error code to description mapping in the front end.
-
In the scenario where the description returned by the backend prevails, the backend system frequently changes and releases the product whenever the interaction or product needs to change the presentation to the customer. At the same time, when the same error code needs to display different description in different front pages, the back end will not be able to do anything.
-
It will be very painful for the students who develop the middle stage to convert the corresponding description of the back-end error code, because the middle stage can correspond to many back-end systems behind the middle stage, and any change of the back-end system may require the middle stage developers to change the code together.
-
Before Taiwan to maintain the relationship, error code and description is a multi-purpose in these three ways, but still and the former two methods have a common pain points, that is interactive, the product has the error change of demands, or the backend system increases the new error code, front-end developers need to make corresponding changes and redistribution.
The disunity of error code processing methods increases the communication cost as well as the research and development cost. Therefore, this paper hopes to propose a unified error code processing method to optimize this process and solve the pain points mentioned in the above methods.
Project Solutions
Optimized error code handling process
When interaction/product colleagues need to adjust the description of error codes, they do not need to communicate with the technical team to agree on the online time. Instead, they can directly configure error codes through the error code operation platform.
After the configuration is complete, the mapping will be stored in the cache and database by the error code background, and the API system of the error code will be exposed to the pages in the external network for access.
Error code domain model
As each product manager and interaction designer has their own ideas about the style and prompt text of each page, the same error code may be presented differently on different pages. Therefore, at the beginning of the system design, this point was fully taken into account and the following domain model was established:
From the above domain model, it is not difficult to see that the following dimensions can be taken as input parameters when the description of an error code is obtained on the front-end page: 1. Institutions (e-school, CC, tools, scholarship…) 2. Scene (specific page) 3. Internationalization (Chinese, English, Japanese)
Through multi-dimensional design, each page of each line of business can be achieved, and targeted Settings can be carried out to provide customers with different displays.
Error code API security issues
At the beginning of the design of the system, I was troubled by the security problem of the API system. Later, after communicating with my friends in the security department, I learned that there was no need to worry about the security because the error code and error description were customer-oriented. So the final design ideas in safety, one is through the safety protection layer of the security department for filtering; Second, error code system itself to access the source of the blacklist and whitelist control. The network access relationship is shown in the following figure:
other
From the technical realization point of view, the construction of error code platform is actually no difficulty.
-
In terms of performance, Codis (demotable)+ MyBatis cache (5m refresh) ensures that the final database access pressure will not be too great.
-
In terms of reliability, the query input parameter of the current end is incorrect, or the product/interaction fails to match the corresponding error code. The error code platform itself also provides default return information, preventing customers from receiving unexpected prompt information.
-
In terms of scalability, as the error code system itself is stateless, and the error code system is prepared to use docker operation and maintenance support provided by the Hujiang OCS team, its load monitoring of applications and corresponding support for scaling and scaling can make this kind of system very good to cope with the performance under different access levels.
The above is my understanding of right and wrong codes, and I share my experience and thinking on the construction of unified error code management platform in the project, hoping to provide some ideas on how to deal with error codes.