The problem background

One morning, the operation classmate suddenly in the group feedback a lot of users to report the login problem. At first, it was thought that the Intranet interface service was abnormal, but the interface reported that no abnormal log was generated. That is to say, the abnormal request has not been dialed. So we log in to the server and filter the node.js service logs:

Based on logs, you can intuitively see the problem: DNS resolution fails

To compose

As a Node.js service with daily traffic of over 10 million, each request needs to resolve N Intranet interface domain names.

In normal times, if the DNS service is faulty or network jitter occurs, online services may become unavailable when both the Node.js service and Intranet interface service are normal

In this case, we need a layer of caching for DNS resolution on the Node.js server

First of all we need to be clear:!! Node.js itself does not cache DNS query results!!

Default DNS query scheme

Let’s first look at the default DNS lookup scheme:

The built-in HTTP module of Node.js uses dns.lookup() to lookup HTTP. Request ()

  • Request () -> net.createconnection () -> dns.lookup()
function lookupAndConnect(self, options) {
  // ...
  const lookup = options.lookup || dns.lookup;
  defaultTriggerAsyncIdScope(self[async_id_symbol], function() {
    lookup(host, dnsopts, function emitLookup(err, ip, addressType) {
      self.emit('lookup', err, ip, addressType, host);
      // ...
    });
  });
}
Copy the code

The options.lookup argument can be set by passing in either dns.resolve or a custom method that meets the requirements

The getaddrinfo function

The dns.lookup() method is called to the end, calling the underlying getAddrInfo () function.

In C/C++ code, the getAddrinfo function is called synchronously, so you need libuv to implement node.js asynchronous I/O through the thread pool

Note: Looking back, we can see that the default thread pool size is 4

This can be set using the UV_THREADPOOL_SIZE environment variable. Node.js V14 has a maximum of 1024

Problems that might arise

When a request takes a long time in the DNS query phase, the service processing speed is not matched by the number of requests because the default thread pool is too small. The longer the service runs, the more the backlog of requests and connections will increase

About the Default cache

  • !!!!!!!!! Node.js itself does not cache DNS query results!! , node.js requests the DNS Server for each domain name request

  • Using the DNS cache Note the expiration time of the cache

Implement DNS cache dependencies

lookup-dns-cache

Lookup -dns-cache is a very mature DNS cache library, but it is quite old

His idea was simple:

  1. The underlying query uses dns.resolve() instead of dns.lookup
  2. Through aMapThe cache has been resolvedhostnameinformation
  3. Avoid parallel DNS requests. Only one query request for the same hostname is executed at a time, using Map

dns.resolvedns.lookupThe difference between

You can see it in the official documents

  1. dns.resolveDo not usegetaddrinfo()
  2. dns.resolveIt is implemented asynchronously
  3. dns.resolveDo not parse the local hosts file

Details can be found at nodejs.org/dist/latest…

The sample code
const { resolve, lookup } = require('dns'); Lookup ('preview4.xx.xx.com', (err, address, family) => {console.log(' address: %j address family: IPv%s', address, family); // Address: "xxx.xxx.xx.xx" address family: IPv4}); resolve('preview4.xx.xx.com', (err, records) => { console.log(records); // undefined });Copy the code

Preview4.xx.xx.com is the domain name configured for the local host.

Because dns.resolve() does not use getAddrInfo (), the address resolved at this point is undefined

Avoid parallel request implementations

Use Map to cache the hostname being queried. After the query is complete, the Map is deleted

let task = this._tasksManager.find(key);

if (task) {
  task.addResolvedCallback(callback);
} else {
  task = new ResolveTask(hostname, ipVersion);
  this._tasksManager.add(key, task);
  task.on('addresses', addresses => {
    this._addressCache.set(key, addresses);
  });
  task.on('done', () => {
    this._tasksManager.done(key);
  });
  task.addResolvedCallback(callback);
  task.run();
}
Copy the code

Ttl-based caching

  1. throughdns.resloveMethods set upttl:trueLet the DNS query result return TTL value
  2. If the hostname is still in the cache and has not expired, return to the cache directly; otherwise, query
/** * @param {string} key * @returns {Address[]|undefined} */ find(key) { if (! this._cache.has(key)) { return; } const addresses = this._cache.get(key); if (this._isExpired(addresses)) { return; } return addresses; }Copy the code

cacheable-lookup

In practice, it is found that dns.resolve() cannot resolve the domain name configured in the local hosts. Using only lookup-dns-cache will cause an error in the local development environment.

After research, I found the cacheable- Lookup library, the author is the author of GOT Szmarczak.

As we can see from the commit record, the author is still keeping the library updated.

Contrast the lookup – DNS cache

async queryAndCache(hostname) { if (this._hostnamesToFallback.has(hostname)) { return this._dnsLookup(hostname, all); } let query = await this._resolve(hostname); if (query.entries.length === 0 && this._dnsLookup) { query = await this._lookup(hostname); if (query.entries.length ! == 0 && this.fallbackDuration > 0) { // Use `dns.lookup(...) ` for that particular hostname this._hostnamesToFallback.add(hostname); } } const cacheTtl = query.entries.length === 0 ? this.errorTtl : query.cacheTtl; await this._set(hostname, query.entries, cacheTtl); return query.entries; }Copy the code

If the resolve method does not resolve successfully, the lookup method will be used. This is more consistent with the hosts file change scenario in our local development environment

The library also provides functionality points such as TTL-based caching, blocking parallel requests, and so on

So we can solve our problem!

The solution

Cacheable -lookup is added to the cache for DNS resolution at all calls to interfaces on the Intranet