Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money

Exception catching is an essential consideration in a back-end service design

When an exception occurs, it’s important to be able to catch it in the first place and have enough information to locate the problem and that’s what this article is about

To start, ask two questions

  1. In the production environment, when the database connected to the back end is down, can we receive the notification and locate the problem as soon as possible, instead of waiting for user feedback and taking half a day to find the problem (although the operation and maintenance will definitely know the database is down in the first time)?
  2. There is a problem with an API in production. Can you measure the critical importance of the error and resolve the problem according to the report

Abnormal collection

Exceptions generally occur in the following locations

  1. API/GraphQL layer. In the outermost layer of API layer, a middleware is used to process and report the error set. Active exception catching is often not required at the concrete logical level unless there is special handling for exceptions, such as fallback after a failed database transaction

    // Centralize exception handling in middleware
    app.use(async (ctx, next) => {
      try {
        await next();
      } catch (err) {
        ctx.body = formatError(err)
      }
    })
    
    // GraphQL also works
    new ApolloServer({
      typeDefs,
      resolvers,
      formatError
    })
    Copy the code

    When an exception is caught at the logical level and then thrown manually, context information is not lost and can be retained by adding the field originalError

  2. Script and other non-API layers, such as pull configuration, database migration scripts and scheduled tasks

In addition to actively caught exceptions, there are also exceptions that may be missed, which may cause the program to exit

process.on('uncaughtException'.(err) = > {
  console.error('uncaught', err)
})

process.on('unhandledRejection'.(reason, p) = > {
  console.error('unhandle', reason, p)
})
Copy the code

API exception structuring

When exceptions occur at the API layer, exceptions need to be structured when data is transmitted to the client, so as to facilitate further exception reporting and front-end analysis of structured information and corresponding display

The following uses FormatError to represent the structured information that the API should return when an exception occurs

When an exception occurs, the exception can be uniformly caught on the topmost middleware as an error-handling middleware, and a function formatError can be used to unify the structured exception information in the middleware

interface FormatError {
  code: string;
  message: string;
  info: any;
  data: Record<string.any> | null; originalError? :Error;
}

function formatError (error: Error) :FormatError;
Copy the code

The following is an illustration of the fields of FormatError

code

Represents an error identifier used to classify errors such as invalid user input data (ValidationError), database exception (DatabaseError), and failed external service request (RequestError).

As a rule of thumb I have divided codes into the following categories

  • ValidationError: Invalid user input
  • DatabaseError, database problem
    • DatabaseUniqueError
    • DatabaseConnectionError
    • .
  • RequestError, external service
    • RequestTimeoutError
  • ForbiddenError
  • AuthError, unauthorized request to authorize a resource
  • AppError, business problem
    • AppBadRequest
    • .
  • .

For data validation, database exceptions and request failures, we usually use third-party libraries. At this point, you can customize the code according to the Error of the third-party library

message

An error message that represents Human-readable does not necessarily mean that it can be displayed in the front end. Human here represents a developer, not a user, and the following two messages are not suitable for display in the front end

  1. connect ECONNREFUSED postgres.xiange.tech. There is no need to display database disconnection messages in the front end
  2. The email is required. Input data verification, although it can be displayed to users, it needs to be displayed in Chinese (internationalization).

You can use the code to determine whether the front end can display the information that the back end expects it to display, and you can use the code to do global processing on the front end

info

Represents some more detailed information about code

  • When sending a request fails, you should at least know what the failed request looks like: Method, Params /body, or headers
  • When the user input data verification failure, at least know which fields

originalError

OriginalError represents the error API raised by the exception, which tends to contain more detailed context information

Originalerror. stack represents the current stack of errors and can be used to quickly locate the problem when an exception occurs (although Node sometimes throws stack information in files it has never seen before).

While in development and test environments, adding OriginalErrors to the API can quickly locate problems. When in production environments, do not include your OriginalErrors in the API. You can find the complete error information in the monitoring system

You can use the following two apis to optimize yoursstacktrace

Error.captureStackTrace(error, constructorOpt)
Error.prepareStackTrace(error, structuredStackTrace)
Copy the code

For details, see the V8 Stack Trace API

data

Represents the data returned by the interface. Should data be returned null when an API error occurs?

No, when the API reports an error, only some fields may be faulty, and the rest of the fields will return normally. Since GraphQL is an aggregation of fields, this is very obvious in GraphQL.

http status

When an error occurs during API processing, a 400+ Status code should be returned

  • HTTP / 1.1 400 Bad Request
  • HTTP / 1.1 401 Unauthorized
  • HTTP / 1.1 403 Forbidden
  • HTTP/1.1 429 Too Many Requests

filter

In local development, there is often no need to report exceptions to Sentry. Sentry also provides hooks to filter exceptions before they are reported

beforeSend? (event: Event, hint? : EventHint):Promise<Event | null> | Event | null;
Copy the code

The monitoring system

Monitoring first requires a monitoring system, and I recommend Sentry. For details on how to deploy Sentry, see my previous article: How to Deploy Sentry.

You can also register directly with the official SaaS version: Sentry Pay. The personal free version has a 5K error reporting limit per month, which is enough for personal use.

Using SaaS eliminates some of the daily work of operations and maintenance compared to the self-built version. Most importantly, the built – in system has functional limitations.

Here is the documentation for Sentry

indicators

In addition to the exception itself, more indicators should be collected for exception monitoring.

The most important goal of exception monitoring is to restore the exception throw scenario

  1. The exception levels are Fatel, Error, and Warn. This is the difference between when you receive an alarm email or text on Sunday and when you open your laptop to fix the Bug. You can do it with code

    const codeLevelMap = {
      ValidationError: 'warn'.DatabaseError: 'error'
    }
    Copy the code
  2. Environment: Production environment or test environment, before users and test found problems, can directly read the application service environment variables

  3. Context: such as which API request, which user, and more detailed HTTP message information. Context information can be reported directly using Sentry’s API

    Sentry.configureScope(scope= > {
      scope.addEventProcessor(event= > Sentry.Handlers.parseRequest(event, ctx.request))
    })
    Copy the code
  4. User: The API error is triggered by that user

  5. Code: Facilitates error classification

  6. Request_id: Easy to trace and get more debugging information: find the SQL statement currently executed by the API in the ELK

    const requestId = ctx.header['x-request-id'] | |Math.random().toString(36).substr(2.9)
    Sentry.configureScope(scope= > {
      scope.setTag('requestId', requestId)
    })
    Copy the code

As can be seen from the above, data collected for indicators generally comes from two aspects: HTTP packets and environment variables