From GraphQL N+1 problem to DataLoader source code parsing

preface

After playing with GraphQL for a while, IT occurred to me that I needed to take a look at the DataLoader source code as an opportunity to start updating the article again. I’ve got a bunch of ideas I want to write… But you can’t help being lazy, and you’ve been busy learning a lot of new things lately.

In fact, DataLoader itself is not that complicated. What is good and powerful is the idea. Every front-end engineer knows the concept of event loops at some point, but can actually use the event loop mechanism to solve one of the most notorious shortcomings of the GraphQL API: The N+1 problem is not easy.

This article will contain more dry goods, such as:

Apply DataLoader changes to GraphQL Resolver
DataLoader source code (Batch)
Mini implementation of DataLoader
The DataLoader Prisma2
Integrate DataLoader into the ORM layer and framework layer

See GitHub repository Dataloader-source-explore for a Demo of this article

This article will help you sort out the root cause of the GraphQL N+1 problem, then fix it, review the event loop mechanism a little bit, and integrate DataLoader with ORM/ framework. This is my first official post on GraphQL.

Originally my idea is to write a series of GraphQL articles to (about), but it is always just be agitated, the reason is that I’m not used to the induction phase of knowledge lecture is too detailed, but neither to get started and students must be zero base oriented, dove buy the first so ~ maybe one day I will suddenly start to write the full, This post, for example, came to me yesterday when I saw the GitHub list and a big guy forked out the DataLoader repository.

GraphQL N + 1 problem

This article will also be helpful for those of you who have not touched GraphQL before, because it will not be too complicated, and I will also explain the API used and so on. Let’s take this as your first lesson to discover your interest in GraphQL.

Suppose we have a scenario where we want to get all the pet information of all users. If we use the RESTFul API, we need to get all the user ids first, and then go to the pet interface to get all the pet information:

GET /users
[
  {
    "id": 1,
	"name": "aaa",
	"petsId": [1, 2]
  },
  {
	"id": 2,
	"name": "bbb",
	"petsId": [2, 3, 4]
  }
]

GET /pet/:id
[
  {
    "id": 1,
    "kind": "Cat"
  },
  {
    "id": 2,
    "kind": "Dog"
  }
	// ...
]
Copy the code

Query times N+1, database I/O times N+1

If the /users interface returns the user information that does not contain the petsId field, then even N times need to query /user/:id interface to get the user information, the overall number of queries and database I/O times reaches 2N+1

For GraphQL, the query statement would normally look like this:

query {
  fetchAllUsers {
    id
    name
    pets {
      id
      kind
      age
      isMale
    }
  }
}
Copy the code

I only have one query, right? But what about database I/O counts?

Looks like it was only once? Because the data came back together? In fact, the number of database I/ OS is N+1. We will introduce the basic meaning of the GraphQL statement above in the following demo:

queryThis means that this query is read-only and not written, which is in GraphQLOperation typeConcepts, other types of operationsmutationwithsubscription, respectively represent read and write operations (which can be imagined as RESTFulPOST/PUT/DeleteIsoband method) and subscribe operation (can be understood as WebSocket, in fact subscribe operation must be carried over WebSocket)
fetchAllUsersIs an object type that contains theid nameThese basic attributes, as wellpetsThis object property goes into thepetsIn, all it has are the base properties. In a large GraphQL API, the nesting of object attributes can be tens of layers or more (this is a feature of GraphQL, but it can also cause serious performance problems, so it is common to limit the nesting depth and discard requests beyond a certain depth).
We usually have REST apisControllerCome onRespond to routing levels. Corresponding to GraphQLResolverBut it’s aimed atObjectType (ObjectType)That is, each object type has its own parser, such as the one abovefetchAllUsers 和 petsYou can understand it asA Resolver is a function for an object type.

Now suppose we have 100 users, and the parser that calls fetchAllUsers returns an array of 100 users, and pets is an object property in the user object. How many times do we need to call the pets parser? Of course, the ideal situation is 1 time, because we can get all users’ information in the first query, so just use all the pet ID to check again to get all the pet information.

In practice, however, GraphQL will execute user 1’s firstpetsParser, gets user 1’s pet information, and then executes user 2… , 100 times, plus the one query for all users, that’s GraphQL’s N+1 problem.

The actual problem

Let’s look at an actual demo:

See source code: Demo1

Here I wrote a simple GraphQL Server with Apollo-Server and directly simulated the delay of data return with Promise to reduce the cost of environment configuration. If you want to build a full GraphQL service, TypeGraphQL and other apollo-graphQL open source projects are recommended, as well as the big demo I wrote: GraphQL-Explorer-Server

Our GraphQL Schema definition looks like this:

const typeDefs = gql` type Query { fetchAllUsers: [User] fetchUserByName(name: String!) : User } type User { id: Int! name: String! partner: User pets: [Pet] } type Pet { id: Int! kind: String! age: Int! isMale: Boolean! } `;
Copy the code

GQL allows you to convert GraphQL Schema to DocumentNode, which is the node definition of GraphQL. The generated definition contains Query/User/Pet definitions.

Then we write a simple mock data:

const promiseWrapper = <T>(value: T, indicator: string) :Promise<T> =>
  new Promise((resolve) = > {
    setTimeout(() = > {
      console.log(chalk.cyanBright(indicator));
      return resolve(value);
    }, 200);
  });

const mockService = (() = > {
  const users: IUser[] = [];
  const pets: IPet[] = [];

  return {
    getUserById: (id: number) = >
      promiseWrapper(
        users.find((user) = > user.id === id),
        `getUserById: ${id}`
      ),

    getUserByName: (name: string) = >
      promiseWrapper(
        users.find((user) = > user.name === name),
        `getUserByName: ${name}`
      ),

    getUsersByIds: (ids: number[]) = >
      promiseWrapper(
        users.filter((user) = > ids.includes(user.id)),
        `getUsersByIds: ${ids}`
      ),

    getAllUsers: () = > promiseWrapper(users, "getAllUsers"),

    getPetById: (id: number) = >
      promiseWrapper(
        pets.find((pet) = > pet.id === id),
        `getPetById: ${id}`
      ),

    getPetsByIds: (ids: number[]) = >
      promiseWrapper(
        pets.filter((pet) = > ids.includes(pet.id)),
        `getPetsByIds: ${ids}`
      ),

    getAllPets: () = > promiseWrapper(pets, "getAllPtes"),
  };
})();
Copy the code

This mimics the Service layer that is normally called in Controller/Resolver, with the second parameter of promiseWrapper to help locate the currently invoked method.

Each object type needs its own Resolver. To be more precise, each object type used (except nested) in the root Query object needs its own Resolver. For example, query. fetchAllUsers returns [User], so we need a special User Resolver, while user. pets returns [Pet]. We don’t need to define a special Pet Resolver. Instead, you can define the User.pets parser directly, for example:

const resolvers = {
  Query: {
    fetchUserByName(root, { name }: { name: string }, { service }: IContext) {
      return service.getUserByName(name);
    },
    fetchAllUsers(root, args, { service }: IContext) {
      returnservice.getAllUsers(); ,}}};Copy the code

FetchAllUsers returns no partner or PETS parser for User, so a complete resolver should look like this:

const resolvers = {
  Query: {
    fetchUserByName(root, { name }: { name: string }, { service }: IContext) {
      return service.getUserByName(name);
    },
    fetchAllUsers(root, args, { service }: IContext) {
      returnservice.getAllUsers(); }},User: {
    async partner(user: IUser, args, { service }: IContext) {
      return service.getUserById(user.partnerId);
    },
    async pets(user: IUser, args, { service }: IContext) {
      returnservice.getPetsByIds(user.petsId); ,}}};Copy the code

Now that we’ve written the type definition and the corresponding parser, we can start a GraphQL service based on Apollo-server:


const server = new ApolloServer({
  typeDefs,
  resolvers,
  tracing: true.context: async() = > {return {
      service: mockService,
      },
    };
  },
  playground: {
    settings: {
      "editor.fontSize": 16."editor.fontFamily": "Fira Code",}}}); server.listen(4545).then(({ url }) = > {
  console.log(chalk.greenBright(`Apollo GraphQL Server ready at ${url}`));
});
Copy the code

For those of you who have not used Apollo-Server, there is one thing to note here: The context attribute passed in during instantiation of ApolloServer is taken as the third parameter in the resolver, so we can get mockService in the Resolver.

Visit http://localhost:4545/graphql, then use the following query:

This interface comes from GraphQL Playground, a very powerful debugging tool for the GraphQL API

query {
  fetchAllUsers {
    id
    name
    partner {
      id
      name
    }
  }
}
Copy the code

The result should look like this:

Terminal print result:

As you can see, getAllUsers is now called once and getUserById is called five times, that is, the N+1 problem exists. How to solve it? That brings out the DataLoader.

README, the repository of DataLoader, tells the story of its origins and history. It is no exaggerating to say that DataLoader is available in almost every GraphQL service in official use, except for the hands-on projects and small scale projects with no performance shortage. There are some exceptions. For example, schemes like Hasura PostGraphile take over the database directly.

DataLoader was originally written in Flow(as well as GraphQL, which is FaceBook’s JavaScript typing tool and TypeScript), but I’ve changed it to TypeScript for ease of reading. See Dataloader.ts

The DataLoader constructor signature looks like this:

constructor(batchLoadFn: DataLoader.BatchLoadFn
      
       , options? : DataLoader.Options
       ,>
      ,>);
Copy the code

We’ll focus on the batchLoadFn, which is the batch function we passed in. Simply put, it is a function that retrieves a set of data based on a set of ids, like TypeORM’s repository.findByids () method.

You might have seen the DataLoader idea, or you might have seen the root cause of the GraphQL N+1 problem in the beginning. Just collect a bunch of data ids that need to be queried and parse them together at the end. That’s what DataLoader does, which is why we need to pass in a batch query function at instantiation time.

Because DataLoader also has caching capabilities, and each instance takes up a chunk of memory, the best approach when there are multiple object types is to construct a new instance for each object type, like here:

context: async() = > {return {
      service: mockService,
      dataloaders: {
        users: new DataLoader(async (userIds: Readonly<number[] = > >) {console.log("Received User IDs");
          console.log(userIds);
          const users = await mockService.getUsersByIds(userIds as number[]);
          return users.sort(
            (prev, curr) = > userIds.indexOf(prev.id) - userIds.indexOf(curr.id)
          );
        }),
        pets: new DataLoader(
          async (petIds: Readonly<number[] = > >) {console.log("Received Pet IDs");
            console.log(petIds);
            const pets = await mockService.getPetsByIds(petIds as number[]);
            return pets.sort(
              (prev, curr) = >petIds.indexOf(prev.id) - petIds.indexOf(curr.id) ); })}}; },Copy the code

Points to note:

After instantiation, there are two main methods on the DataLoader instance: Load and loadMany. Here, load should be used for user. partner(one-to-one relationship) and loadMany should be used for user. pets(one-to-many)
To take advantage of DataLoader’s caching capabilities, we should ensure that the order of the input parameters (userIds) and return values (users) is one-to-one, or you can customize the cache mapping, which is not expanded here.

Change resolver to use dataloader to fetch data:

const resolvers = {
	// ...
  User: {
    async partner(user: IUser, args, { service, dataloaders }: IContext) {
      // return service.getUserById(user.partnerId);
      return dataloaders.users.load(user.partnerId);
    },
    async pets(user: IUser, args, { service, dataloaders }: IContext) {
      // return service.getPetsByIds(user.petsId);
      returndataloaders.pets.loadMany(user.petsId); ,}}};Copy the code

Query statement unchanged to see the result:

Boy, you’re a genius for calling getUsersByIds once to get all the data back.

Try the PETS field again:

0 {“id”:2, … } This is a piece of data that I printed inside DataLoader. In this demo we refer directly to the TS version of DataLoader instead of using the NPM package.

import DataLoader from "./dataloader";
Copy the code

DataLoader source

Now we can look at the DataLoader source code. Before we start, think about how you would implement DataLoader.

Batch and Cache are the core functions of DataLoader. Batch and Cache are the core functions of DataLoader. Batch and Cache are the core functions of DataLoader. The best way to read TypeScript code is to look at its type declarations:

To keep it short, the pasted code below removes cache-related processing and validation logic

/ / class
constructor(batchLoadFn: BatchLoadFn
       
        , options? : Options
        ,>
       ,>) {

    this._batchLoadFn = batchLoadFn;
    this._batchScheduleFn = getValidBatchScheduleFn(options);
  
    this._maxBatchSize = getValidMaxBatchSize(options);
  
    this._batch = null;
  }
Copy the code

Instantiate the DataLoader using batchLoadFn and options, where the type focus is batchLoadFn and options:

export type BatchLoadFn<K, V> = (
  keys: Readonly<Array<K>>
) = > Promise<Readonly<Array<V | Error> > >;export typeOptions<K, V, C = K> = { batch? :boolean; maxBatchSize? :number; batchScheduleFn? :(callback: () => void) = > void;
};
Copy the code

There’s nothing to say, but the batchScheduleFn here is a very important function, literally a scheduling function,

this._batchScheduleFn = getValidBatchScheduleFn(options); Here we call a getValidBatchScheduleFn as the scheduling function for this instance:

function getValidBatchScheduleFn(options? : Options<any.any.any>
) : (fn: () => void) = >void {
  let batchScheduleFn = options && options.batchScheduleFn;
  if (batchScheduleFn === undefined) {
    return enqueuePostPromiseJob;
  }
  if (typeofbatchScheduleFn ! = ="function") {
    throw new TypeError(
      `batchScheduleFn must be a function: ${batchScheduleFn}`
    );
  }
  return batchScheduleFn;
}
Copy the code

And what this function does is very simple, if you pass in a scheduling function when you instantiate, use the one that’s passed in, otherwise use **enqueuePostPromiseJob**, which we’ll talk about later, the variable name, is the task after making a Promise.

Then look at the load method we used earlier:

type Batch<K, V> = {
  hasDispatched: boolean;
  keys: Array<K>;
  callbacks: ArrayThe < {resolve: (value: V) = > void;
    reject: (error: Error) = > void; } >. }; load(key: K):Promise<V> {

    let batch = getCurrentBatch(this);
    batch.keys.push(key);

    const promise: Promise<V> = new Promise((resolve, reject) = > {
      batch.callbacks.push({ resolve, reject });
    });

    return promise;
  }
Copy the code

The load method is surprisingly simple to implement, which was one of my first sighs.

Call the getCurrentBatch method to get the Batch, add the input key to the keys array, and then generate a promise, adding the promise’s resolve Reject method to the Callbacks array.

If you look at this, you may still be confused. It’s not a big problem. If you look at two functions, you will be enlightened.

function getCurrentBatch<K.V> (loader: DataLoader<K, V, any>) :Batch<K.V> {
  let existingBatch = loader._batch;

  if( existingBatch ! = =null &&
    !existingBatch.hasDispatched &&
    existingBatch.keys.length < loader._maxBatchSize
  ) {
    return existingBatch;
  }

  let newBatch = { hasDispatched: false.keys: [].callbacks: []}; loader._batch = newBatch; loader._batchScheduleFn(() = > {
    dispatchBatch(loader, newBatch);
  });

  return newBatch;
}
Copy the code

This function returns the current batch and determines whether the current instance is already in a batch(or owned?). , if so, the existing Batch is returned. If not, a new one is generated and marked as undispatched (hasDispatched: false) and a function calling dispatchBatch is added to the dispatch function.

Resolution:

The first call to the load method creates a new Batch and mounts it to the instance, which is returned by subsequent calls within the same Batch. After the batch is drunk, the key from the load call is added to batch.keys
It’s only the first time the load method is calleddispatchBatch(loader, newBatch)Add to the scheduler functionStart executing “Tasks” in the current batch.

Now we can look at enqueuePostPromiseJob, which is actually the core of the DataLoader Batch implementation:

let enqueuePostPromiseJob =
  typeof process === "object" && typeof process.nextTick === "function"
    ? function (fn) {
        if(! resolvedPromise) { resolvedPromise =Promise.resolve();
        }
        resolvedPromise.then(() = > {
          process.nextTick(fn);
        });
      }
    : setImmediate || setTimeout;
Copy the code

It is easy to understand if you are currently in a NodeJS environment

Promise.resolve().then(() = >{
  process.nextTick(fn)
})
Copy the code

In a browser environment, use setImmediate if supported, setTimeout otherwise

SetImmediate only the latest version of Internet Explorer implements setImmediate; neither Gecko nor the Webkit kernel implements the API

SetImmediate Can be simulated using setTimeout(fn, 0). See setImmediate
Using setTimeout is equivalent to:
setTimeout(() = >{})
Copy the code

With that in mind, let’s move on to the logic, and there will be a section on the priority of these apis in the NodeJS event loop.

function dispatchBatch<K.V> (
  loader: DataLoader<K, V, any>,
  batch: Batch<K, V>
) {
  batch.hasDispatched = true;

  if (batch.keys.length === 0) {
    return;
  }

  let batchPromise = loader._batchLoadFn(batch.keys);

  batchPromise
    .then((values) = > {

      for (let i = 0; i < batch.callbacks.length; i++) {
        let value = values[i];
        if (value instanceof Error) {
          batch.callbacks[i].reject(value);
        } else {
          console.log(`${i} The ${JSON.stringify(value)}`);
          batch.callbacks[i].resolve(value);
        }
      }
    })
    .catch((error) = > {
      failedDispatch(loader, batch, error);
    });
}
Copy the code

In dispatchBatch:

Marks the current batch as delivered
Call the batch query function passed in at instantiation time (that is, the data that needs to be batched) with batch.keys as an argument
Iterate over the value returned by the batch query function (within the Promise), calling in sequencebatch.callbacks‘resolve’, or ‘reject’Resolve the load return promise resolveYou can see that keys and callbacks follow the order of addition.

Let’s go over the logic from the beginning:

Call the load method, call it firstgetCurrentBatchThe current Batch is obtained
getCurrentBatch, returns the existing Batch or creates a new batch. For the newly created batch, call_batchScheduleFnThat will bedispatchBatch(loader, newBatch)Add to a batch function
_batchScheduleFnMethod, which is used if none is passed in the optionenqueuePostPromiseJobThis variable is actually the scheduling function that you choose to use based on the environment, and is usually used under NodeJsprocess.nextTick()(It is also wrapped in a then method that immediately resolves the promise) in a browsersetTimeoutAs a scheduling function. You can also specify it explicitly (via instantiation)options.batchScheduleFn)
When batch is obtained in the load method, the current input key is added to batch.keys, generating a promise and adding its resolve REJECT separatelybatch.callbacks, and returns the promise
indispatchBatchAt execution time, the load method has been called several times, and all the keys needed in this batch are added to batch.keys, so the ones passed in during the instantiation can be used directly in this methodbatchLoadFnPass in keys to get all the results you need, then iterate over the results, resolve/reject each promise.
After all load method promises have been resolve/reject, the batch ends

Event loops in NodeJS

We will only talk about the event loop in NodeJS here, because enqueuePostPromiseJob does the same thing, ensuring that dispatchBatch is executed after all Promise tasks.

The event loop in NodeJS is implemented by Libuv, and each cycle is divided into several stages:

Initialize, executeprocess.nextTickwithmicroTasks
Formal Event Loop:
- Timers: Indicates that the execution is expiredsetTimeoutwithsetIntervalNextTick, check and complete all microtasks
- I/O callbacks, callback functions that perform completed I/O operations, check and complete all process.nextTick, check and complete all microtasks
- The idle and prepare phases can be ignored
- poll:
  - Check for available callbacks (timer I/O) and execute, check and complete all process.nexttick, check and complete all microtasks
  - Check when there is no callbacksetImmediateThe callback,If none is present, the block waits for new event notifications at this stage. If yes, go to the next stage check.
  - There are no outstanding callbacks. Proceed to the next phase check
- Check: to performsetImmediateThe callback,Check and complete all process.nextTick, check and complete all microtasks
- close callbacks

DispatchBatch is actually executed like this (under NodeJS):

promise.then(() = > {
  process.nextTick(dispatchBatch(loader, newBatch))
})
Copy the code

Therefore, it is easy to understand that dispatchBatch added in this way will be executed at the beginning of the next event loop, at which point all promises have been executed, meaning that the load method has added all keys to batch.keys, at which point the batchLoadFn is executed, You can ensure that you have collected the keys needed for the batch query.

In the browser, it is easier to understand using setTimeout to ensure that dispatchBatch is executed after all promise tasks because it can be easily divided into micro and macro tasks.

The implementation of loadMany relies on the load method, which is not covered here.

The Mini version DataLoader

It is much easier to implement a simple DataLoader that only supports Batch functionality.

The Mini version of DataLoader has these major differences:

Direct use ofprocess.nextTickAs aenqueuePostPromiseJob
Batch is represented by an array rather than an object, which in this case is defined as a task queue, which is used in the Batch processhasDispatchedIn this case, we directly use the length of the task queue: The first call to LOAD, the task queue is empty, and the execution method passesenqueuePostPromiseJobAdd to the JavaScript task queue

The complete implementation is as follows:

type BatchLoader<K, V> = (
  keys: Readonly<Array<K>>
) = > Promise<Readonly<Array<V | Error> > >;type Task<K, V> = {
  key: K;
  resolve: (val: V) = > void;
  reject: (reason? : unknown) = > void;
};

type Queue<K, V> = Array<Task<K, V>>;

export default class TinyDataLoader<K.V.C> {
  readonly _batchLoader: BatchLoader<K, V>;

  _taskQueue: Queue<K, V>;

  constructor(batchLoader: BatchLoader<K, V>) {
    this._batchLoader = batchLoader;
    this._taskQueue = [];
  }

  load(key: K): Promise<V> {
    const currentQueue = this._taskQueue;

    const shouldDispatch = currentQueue.length === 0;

    if (shouldDispatch) {
      enqueuePostPromiseJob(() = > {
        executeTaskQueue(this);
      });
    }

    const promise = new Promise<V>((resolve, reject) = > {
      currentQueue.push({ key, resolve, reject });
    });

    return promise;
  }

  loadMany(keys: Readonly<Array<K>>): Promise<Array<V | Error> > {return Promise.all(keys.map((key) = > this.load(key))); }}let resolvedPromise: Promise<void>;

function enqueuePostPromiseJob(fn: () => void) :void {
  if(! resolvedPromise) { resolvedPromise =Promise.resolve();
  }

  resolvedPromise.then(() = > process.nextTick(fn));
}

function executeTaskQueue<K.V> (loader: TinyDataLoader<K, V, any>) {
  // Save and empty
  const queue = loader._taskQueue;
  loader._taskQueue = [];

  // All keys are available here
  const keys = queue.map(({ key }) = > key);
  const batchLoader = loader._batchLoader;

  const batchPromise = batchLoader(keys);

  batchPromise.then((values) = > {
    queue.forEach(({ resolve, reject }, index) = > {
      const value = values[index];
      value instanceof Error ? reject(value) : resolve(value);
    });
  });
}
Copy the code

For better understanding, you can map tasks and queues to the original implementation batch.callbacks and Batch, but queues are actually arrays of tasks, each carrying a key Resolve Reject attribute.

Prisma the DataLoader

If you haven’t used Prisma before, Prisma is ** the “next generation OF ORM”**, but it’s completely different from the orMS you’ve used, like defining entities, TypeORM and Sequelize are defined in TS files as JavaScript/TypeScript objects (classes), and are converted to table/column definitions by ORM at runtime. For example, a TypeORM entity might look like this:

import {
  Entity,
  PrimaryGeneratedColumn,
  Column,
  PrimaryColumn,
  Generated,
} from "typeorm";

@Entity(a)export class User {
  @PrimaryGeneratedColumn() id! :number;

  @Column() firstName! :string;

  @Column({ nullable: true}) lastName? :string;

  @Column({ nullable: true}) age? :number;
}
Copy the code

Prisma is completely different. You define the structure of the database in a.prisma file, then run Prisma generate to generate the Prisma Client, import it in the file, and then you can do all kinds of things.

A Prisma Schema might look like this:

// This is your Prisma schema file,
// learn more about it in the docs: https://pris.ly/d/prisma-schema

datasource db {
  provider = "sqlite"
  url      = env("SINGLE_MODEL_DATABASE_URL")
}

generator client {
  provider = "prisma-client-js"
  output   = "./client"
}

model Todo {
  id        Int      @id @default(autoincrement())
  title     String
  content   String?
  finished  Boolean  @default(false)
  createdAt DateTime @default(now())
  updatedAt DateTime @updatedAt
}
Copy the code

The generated client structure looks like this:

Use:

import { PrismaClient } from "./prisma/client";

const prisma = new PrismaClient();

async function createTodo(title: string, content? :string) {
  const res = await prisma.todo.create({
    data: {
      title,
      content,
    },
  });
  return res;
}
Copy the code

These examples are from prisma-article -Example, which I also suddenly remembered that I was going to write several articles on the use of Prisma a few weeks ago… (Prisma and GraphQL fit together, after all.) It’s the Pigeon King.

The GraphQL API uses either dataloader or Hasura/PostGraphile as a database layer solution to solve the N +1 problem. Prisma can also be used as an ORM to solve the problem. Because Prisma has DataLoader built in (I don’t see it in V1, I think it was built in Prisma 2).

See the runtime/DataLoader. Ts

The implementation here is also quite streamlined, with only a few hundred lines or less, the core idea is similar to the original DataLoader, also using process.nextTick to execute dispatchBatch, see source code

Here’s the annotated code:

// Each Task containing these attributes can be understood as the Task in the previous mini-implementation
interface Job {
  resolve: (data: any) = > void;
  reject: (data: any) = > void;
  request: any;
}

export type DataloaderOptions<T> = {
  singleLoader: (request: T) = > Promise<any>;
  batchLoader: (request: T[]) = > Promise<any[] >;// The batch identifier identifies each batch
  batchBy: (request: T) = > string | null;
};

export class Dataloader<T = any> {
  batches: { [key: string]: Job[] };
  private tickActive = false;
  constructor(private options: DataloaderOptions<T>) {
    this.batches = {};
  }

  get [Symbol.toStringTag]() {
    return "Dataloader";
  }

  request(request: T): Promise<any> {
    // Get the identifier of the current batch
    const hash = this.options.batchBy(request);
    if(! hash) {// If you do not need to use batch, use singleLoader directly
      return this.options.singleLoader(request);
    }
    // If the batch is brand new, declare a new namespace (this.batchers[hash]) to store tasks that need to be executed in batches
    if (!this.batches[hash]) {
      this.batches[hash] = [];

      // make sure, that we only tick once at a time
      // Add the new batch to the future execution (enqueuePostPromiseJob)
      if (!this.tickActive) {
        this.tickActive = true;
        process.nextTick(() = > {
          this.dispatchBatches();
          this.tickActive = false; }); }}return new Promise((resolve, reject) = > {
      // Add tasks to the namespace of the batch
      this.batches[hash].push({
        request,
        resolve,
        reject,
      });
    });
  }

  private dispatchBatches() {
    for (const key in this.batches) {
      const batch = this.batches[key];
      delete this.batches[key];

      // only batch if necessary
      // this might occur, if there's e.g. only 1 findUnique in the batch
      // Only singleLoader is used when only one task exists under Batch
      if (batch.length === 1) {
        this.options
          .singleLoader(batch[0].request)
          .then((result) = > {
            if (result instanceof Error) {
              batch[0].reject(result);
            } else {
              batch[0].resolve(result);
            }
          })
          .catch((e) = > {
            batch[0].reject(e);
          });
      } else {
        / / use batchLoader
        this.options
          .batchLoader(batch.map((j) = > j.request))
          .then((results) = > {
            if (results instanceof Error) {
              for (let i = 0; i < batch.length; i++) { batch[i].reject(results); }}else {
              / / traverse resolve/reject
              for (let i = 0; i < batch.length; i++) {
                const value = results[i];
                if (value instanceof Error) {
                  batch[i].reject(value);
                } else {
                  batch[i].resolve(value);
                }
              }
            }
          })
          .catch((e) = > {
            for (let i = 0; i < batch.length; i++) { batch[i].reject(e); }}); }}}}Copy the code

DataLoader integration

TypeGraphQL-DataLoader

If we were using TypeGraphQL, we would no longer use typeDefs and resolvers because TypeGraphQL is used with Apollo-server. The incoming Schema attribute will mask incoming typeDefs and resolvers. And, under the condition of using TypeGraphQL Resolver is also a completely different way of definition, refer to the above GraphQL – Explorer – Server. To use DataLoader in this case, you need to operate at the ORM level (if you don’t use ORM, you can also define DataLoader instances in context for each object type as above).

The community already has a package that provides this capability: TypeGraphQL-Dataloader

It is also simple to use by defining the @TypeOrmLoader decorator on the relational properties of TypeORM, And inject ORM methods to get connections (such as TypeORM’s getConnection) into ApolloServer to get relationship definitions based on the incoming type definition as well as the decorated Target class:

 const relation = tgdContext
        .typeormGetConnection()
        .getMetadata(target.constructor)
        .findRelationWithPropertyPath(propertyKey.toString());
Copy the code

Then call the corresponding Handle, including handleToOne handleToMany, based on the relationship type and whether the property of the current decoration is the owner of the relationship:

consthandle = relation.isManyToOne || relation.isOneToOneOwner ? handleToOne : relation.isOneToMany ? option? .selfKey ? handleOneToManyWithSelfKey : handleToMany : relation.isOneToOneNotOwner ? option? .selfKey ? handleOneToOneNotOwnerWithSelfKey : handleToOne : relation.isManyToMany ? handleToMany :() = > next();
Copy the code

Within different Handle methods, DataLoader instances with different functions are called, for example

class ToOneDataloader<V> extends DataLoader<any.V> {
  constructor(relation: RelationMetadata, connection: Connection) {
    super(
      directLoader(
        relation,
        connection,
relation.inverseEntityMetadata.primaryColumns[0].propertyName ) ); }}Copy the code

The overall effect is to intercept Resolver execution via TypeGraphQL’s useMiddleware.

NestJS-DataLoader

This integration is quite different and more like the way we started: instantiate a new DataLoader instance for each object type:

The usage comes from Nestjs-Dataloader

/ / create
@Injectable(a)export class AccountLoader implements NestDataLoader<string.Account> {
  constructor(private readonly accountService: AccountService) { }

  generateDataLoader(): DataLoader<string, Account> {
    return new DataLoader<string, Account>(keys= > this.accountService.findByIds(keys)); }}// Omit registration as provider

/ / use
@Resolver(Account)
export class AccountResolver {

    @Query(() = > [Account])
    public getAccounts(
        @Args({ name: 'ids'.type: () = > [String] }) ids: string[].@Loader(AccountLoader.name) accountLoader: DataLoader<Account['id'], Account>): Promise<Account[]> {
        returnaccountLoader.loadMany(ids); }}Copy the code

Is it similar to the one you started with? Given that @query (which is derived from @nestjs/ GraphQL, but does exactly the same thing as TypeGraphQL) is essentially how a Resolver is defined, the usage is basically the same.

In fact, TypeGraphQL-Dataloader also provides a way to provide a DataLoader instance for each object type (or entity), which is the most flexible and optimized way.

Source code is much simpler, direct reference to the annotated version of Nestjs-dataloader

conclusion

So that’s it for DataLoader, but just to review, the core idea is actually enqueuePostPromiseJob, By converting a batch of single data queries (GetSingleUserById) into a batch of data queries (GetBatchUsersByIds), you can dramatically reduce database I/ OS and improve the performance of your GraphQL API.

Finally, I think it’s important to add that DataLoader doesn’t necessarily improve the response speed of your GraphQL API. You can start tracing request link times with the ApolloServer Tracing option:

Since this example does not involve actual database I/O, it cannot be used as an example

Without DataLoader, assuming N GetSingleUserById executions, I/O would look like this:

----->
----->
------>
---->
Copy the code

This is equivalent to multiple low-time I/ OS running in parallel.

Instead of executing GetBatchUsersByIds once, I/O might look like this:

--------------->
Copy the code

This is equivalent to a single TIME-CONSUMING I/O execution.

So, using DataLoader doesn’t necessarily improve your interface’S RT. It’s only possible to get significant RT improvements up to a certain data level, and it can be counterproductive at small data levels.

RT: Request Time: indicates the Request Time

Resources to summarize

GitHub homepage for Lin Budu
This article and related demo repository dataloader-source-explore
GraphQL-Explorer-Server
DataLoader
TypeGraphQL
Apollo-GraphQL
TypeGraphQL-DataLoader
NestJS-DataLoader
Prisma 2