Why do log monitoring need best practices?

Good logging practices are critical to monitoring and troubleshooting Node.js servers. They help us track errors in our applications, identify performance optimization opportunities, and perform different types of analysis on the system (for example, in the case of outages or security issues) to make critical product decisions.

1. The choice of log library — good at fake stuff

Advantages and disadvantages of console.log

While console.log() has its uses, it is not an appropriate solution for logging in production applications, and it lacks support for features and configuration options that are essential for good logging setup.

For example, the console method does not support log levels like WARN, error, or DEBUG, although it does provide similar methods console.warn(), console.error(), and console.debug(), But these are just functions that print to standard output or standard error without indicating log severity.

The framework selects four elements

There are four main considerations in choosing the right logging library: ease of recording, formatting, message storage, and performance.

Because loggers will be heavily used throughout the code base, they can hurt the runtime performance of your application. Therefore, the library selection should take into account the performance capabilities of the library to be selected and look at the differences between it and the alternatives.

The current recommended log library

  • Winston – The most popular logging library, supports multiple transports. This allows us to easily configure the preferred storage location for logs.
  • Pino – The main attraction of Pino is its speed. In many cases, it claims to be five times faster than other frameworks.
  • Bunyan – Another rich logging framework that defaults to JSON output and provides CLI tools for viewing logs.
  • Roarr – Roarr is a different type of logger that works with Node.js and browsers.

Using Winston as an example, this issue provides some examples. (Because it is currently the most popular logging framework)

2. Log level – Quickly identify critical emergencies

The log level is the simplest and most effective way to distinguish the types of events in the system. If logging levels are used correctly in your application, it is easy to distinguish critical events that need immediate resolution from purely informational events.

Although logging systems provide different names for severity levels, the concept remains largely the same. The following are the more common logging levels, no matter which logging framework you choose (in descending order of severity) :

  • FATAL: Indicates a catastrophic condition – the application cannot recover. Logging at this level usually indicates the end of the program.
  • ERROR: Indicates an ERROR condition in the system that stops a particular operation, not the entire system. You can log at this level when a third-party API returns an error.
  • WARN: Indicates run time conditions that are undesirable or abnormal, but not necessarily errors. For example, a backup data source can be used when the primary data source is unavailable.
  • INFO: Information messages are purely informational. User-driven or application-specific events may be logged at this level. A common use of this level is to record interesting runtime events, such as service startup or shutdown.
  • DEBUG: Indicates the diagnosis information that may be required for troubleshooting.
  • TRACE: Capture every possible detail about the application’s behavior during development.

3. Log output criteria — Efficient processing

The standard of output should best meet two requirements:

  • Human reading
  • Machine reading

Easy for humans to read

A good log format makes it easy to know what the output means and what information it represents. Sometimes it does not only contain the cause of the error, but also the parameters and upstream and downstream information of the error.

Easy to machine read

The main reason for easy machine reading is that logs can be collected and uniformly formatted as the premise of machine collection, automatic alarm and intelligent alarm;

example

Take the Winston framework for example

const { createLogger, format, transports } = require('winston');

const logger = createLogger({
  format: format.combine(format.timestamp(), format.json()),
  transports: [new transports.Console({})],
});
Copy the code

It can output the following information

{"message":"Connected to DB!"."level":"info"."timestamp":"The 2021-07-28 T22: Abram. 758 z"}
{"message":"Payment received"."level":"info"."timestamp":"The 2021-07-28 T22: wagons. 758 z"}
Copy the code

4. Write descriptions — more valuable

Log entries should adequately describe the events they represent. Each message should be specific to the situation and should clearly explain what happened at the time. In an emergency, your log entry may be your only source of information to help you understand what’s going on, so it’s important to record this information correctly!

Here’s an example

request failed,will retry
Copy the code

The above message does not provide any help in the following areas:

  • Failed request information
  • Analysis of the causes of failure
  • Time interval before the request is retried

Here’s a better example

"POST" request to "https://example.com/api" failed. Response code: "429", response message: "too many requests". Retrying after "60" seconds.
Copy the code

The second message is much better because it provides enough information about the failed request, including the status code and response message, and also indicates that the request will be retried in 60 seconds. If all messages are described in relation to each other, we are much happier understanding the log.

5. Add context to logs – Add key information

In addition to necessary description information, you can add some key context information to the log information, such as key data points, event timestamps, and related functions.

case

For example, an order log may contain several key data information:

  • Session identifier
  • User name and ID
  • Product or transaction identifier
  • The current page of the user

Each of these data points can be used to analyze the user’s flow throughout the ordering process. If an important event occurs, the available data is automatically appended to the log output and can identify:

  • The situation that led to the event (for example, the user who experienced the event)
  • It happens on the page
  • The transaction and product ID that triggered the event.

You can filter logs based on these key information. Achieve the purpose of knowing and replicating the user’s circulation link in the system;

Winston case

Winston provides global information injection capabilities;

Inject service information globally:

const logger = createLogger({
  format: format.combine(format.timestamp(), format.json()),
  defaultMeta: {
    service: 'billing-service',
  },
  transports: [new transports.Console({})],
});
Copy the code

Output results:

{" message ":" order "1234" has been successfully processed ", "level" : "info", "service" : "billing - service", "timestamp" : "the 2021-07-29 T10:56:14. 651 z}"Copy the code

6. Avoid recording sensitive information — safety first

Whether or not our industry has strict regulations on compliance (such as healthcare or finance), it is important to avoid including sensitive information in your logs.

Sensitive information includes id card information, addresses, passwords, credit card details, access tokens and similar data types. Because log messages are usually stored in plain text, this data can be exposed if the log falls into the wrong hands. We must also document certain information to ensure that you do not violate regulations (such as GDPR) applicable to the country/region in which your product operates.

Based on the principle of minimization, which parts of the system use this data can avoid accidental leakage of sensitive data in logs. For example, credit card details can only be seen by the billing component of the system, and sensitive data should be kept outside the URL – edited where possible.

While this is not a foolproof solution, block lists can also be used to prevent specific fields from entering the log.

Reference: blog.appsignal.com/2021/09/01/…