z The Flat Field Z
[ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ ]

Dataloader

Request scoped caching is an incredibly common practice in back-end services. Dataloader is a way to implement request scoped caching, but it has a trick up its sleeve that makes it particularly useful for GraphQL.

Example

We'll start with a quick example of what Dataloader can do for us:

import DataLoader from "dataloader";

async function example() {
  // This is a batch loader function.
  // It accepts array of one or more keys.
  // It returns an array of results of the same length as `keys`.
  // These results must be ordered the same as `keys` as well.
  const batchLoader = async (keys: readonly string[]): Promise<string[]> => {
    console.log("Loading...");
    return keys.map((key) => `Data for ${key}`);
  };

  // Instantiate a loader.
  // This will generally be request scoped.
  const loader = new DataLoader(batchLoader);

  // Load the same key twice.
  const response1 = await loader.load("test");
  const response2 = await loader.load("test");
}

example();
Loading...

To use dataloader you must first define a loader function. The loader function must adhere to the following:

  • It accepts a read-only array of key strings to identify the data that needs to be loaded
  • It collects the result data into an array which must be the same length as the keys, and in the same order as the keys
  • It returns the result data as a promise

These are hard rules. If there is no data for a given key return null for it in the result array, and even if the results are received in a different order than the keys you must re-order those results to match the keys order before you return them.

Back to the example at hand. We loaded the key test twice, and we only saw the Loading... message once because the dataloader cached the result the first time we loaded it. But that isn't what makes dataloader special. After all that is how every request scoped cache works. Let's take a look at a second example:

import DataLoader from "dataloader";

async function example() {
  const batchLoader = async (keys: readonly string[]): Promise<string[]> => {
    console.log("Loading...");
    return keys.map((key) => `Data for ${key}`);
  };

  const loader = new DataLoader(batchLoader);

  // Load two keys at the same time.
  await Promise.all([loader.load("test"), loader.load("test2")]);
}

example();
Loading...

In this case we are loading two keys: test and test2. But since we are doing it in a Promise.all call we only see the Loading... message once. The call to the loader function will have two keys, and return two results. This happens because dataloader will batch up any requests made in a single frame of execution, and then call the loader function at the end of it. Because these examples are in TypeScript a single frame of execution is actually a single tick of the event loop.

GraphQL

Data loaders are particularly useful when building GraphQL APIs. Because GraphQL supports fields resolvers a single request might load many different entities nested in one another. Let's look at an example query:

{
  dog {
    friends {
      owner {
        name
      }
    }
  }
}

This query gets the names of the owners of a dogs friends. Here is how it would be resolved without dataloader:

#+begin_src mermaid :file "without-dataloader.svg" :pupeteer-config-file "~/.emacs.d/pupeteer-config.json" :mermaid-config-file "~/.emacs.d/mermaid-config.json" :background-color "transparent" stateDiagram-v2 direction LR state "Load dog" as ldog state "Load friend 1" as lfriend1 state "Load friend 2" as lfriend2 state "Load friend 3" as lfriend3 state "Load owner 1" as lowner1 state "Load owner 2" as lowner2 state "Load owner 3" as lowner3 [*] --> ldog ldog --> lfriend1 ldog --> lfriend2 ldog --> lfriend3 lfriend1 --> lowner1 lfriend2 --> lowner2 lfriend3 --> lowner3 #+end_src #+RESULTS: [[file:without-dataloader.svg]]

Resolving the query without dataloader

Here the dog had three friends which means we queried the database at least seven times to resolve the query. Now that might not seem like a ton, but the worst cases can get a lot worse. What if the dog has 100 friends? What if query drills down even further in the owner object? It's not hard to write a query that can hit the database 100s of times to resolve a single query. Now let's see how this would look with dataloader:

#+begin_src mermaid :file "with-dataloader.svg" :pupeteer-config-file "~/.emacs.d/pupeteer-config.json" :mermaid-config-file "~/.emacs.d/mermaid-config.json" :background-color "transparent" stateDiagram-v2 direction LR state "Load dog" as ldog state "Load friends" as lfriend state "Load owners" as lowner [*] --> ldog ldog --> lfriend lfriend --> lowner #+end_src #+RESULTS: [[file:with-dataloader.svg]]

Resolving the query with dataloader

This will scale up much better to more complex queries.

Implementing Dataloader

Now let's take a deeper look at how to implement dataloader for a service that queries a relational database. First we'll take a look at how to batch load a single entity like owner in the example query:

import { Kysely, SqliteDialect, Selectable } from "kysely";
import Database from "better-sqlite3";

// The columns in the `owners` table.
interface OwnerTable {
  id: string;
  name: string;
}

// The tables in the database.
interface Database {
  owners: OwnerTable;
}

// Instantiate a query builder instance.
const db = new Kysely<Database>({
  dialect: new SqliteDialect({
    database: new Database("test.db"),
  }),
});

// Batch loader for a single `Owner` object.
const batchLoadOwners = async (keys: readonly string[]) => {
  // Build a query to get every owner that matches a key.
  const owners = await db
    .selectFrom("owners")
    .select("id", "name")
    .where("id", "in", keys)
    .execute();

  // Re-order the results to match the keys inserting null where there was no result.
  const results: Array<Selectable<OwnerTable> | null> = [];
  for (const key of keys) {
    results.push(owners.find((owner) => owner.id === key) ?? null);
  }

  // Return the reults.
  return results;
};

To efficiently load more than one owner we select from the owners table filtering by all of the ids that come in as keys. Then we reorder the results to match the keys order inserting null where one is missing. That is the more straightforward case though. How do you load a list of entities like friends in the above query. Well it would look like this:

import { Kysely, SqliteDialect, Selectable, sql } from "kysely";
import Database from "better-sqlite3";

// The columns in the `friends` table.
interface FriendTable {
  id: string;
  name: string;
  dog_id: string;
}

// The tables in the database.
interface Database {
  friends: FriendTable;
}

// Instantiate a query builder instance.
const db = new Kysely<Database>({
  dialect: new SqliteDialect({
    database: new Database("test.db"),
  }),
});

// A batch loader function for multiple `Friend` objects.
const batchLoadFriends = async (keys: readonly string[]) => {
  // Map the keys to queries to get all of the friends for a dog.
  // Each key gets one query which has a unique numeric index attached to each result for it.
  let index = 0;
  const queries = keys.map((key) => {
    return db
      .selectFrom("friends")
      .select(["id", "name", "dog_id", () => sql.raw(`${index++}`).as("index")])
      .where("dog_id", "=", key);
  });

  // Join the queries together with UNION ALL.
  let builder = queries.at(0);
  for (const query of queries.slice(1)) {
    builder = builder?.unionAll(query);
  }

  // Execute the query.
  const friends = (await builder?.execute()) || [];

  // Split up the results by key using the numeric index.
  const results: Array<Selectable<FriendTable>>[] = [];
  keys.forEach((_, index) => {
    results.push(friends.filter((friend) => friend.index == index));
  });

  // Return the results.
  return results;
};

We map every key to a query to get the friends for that key. That query includes a unique numeric index that will be used later. We don't want to execute these queries one at a time, so we use UNION ALL to join them together. From there it's just a matter of splitting up the results by index in the order of keys.

Conclusion

Dataloader is a powerful way to implement request scoped caching, and it's particularly helpful when building GraphQL APIs.