Serverless, GraphQL, and DynamoDB are a powerful combination for building websites. The first two are well-loved, but DynamoDB is often misunderstood or actively avoided. It’s often dismissed by folks who consider it only worth the effort “at scale.”
That was my assumption, too, and I tried to stick with a SQL database for my serverless apps. But after learning and using DynamoDB, I see the benefits of it for projects of any scale.
To show you what I mean, let’s build an API from start to finish — without any heavy Object Relational Mapper (ORM) or GraphQL framework to hide what is really going on. Maybe when we’re done you might consider giving DynamoDB a second look. I think it is worth the effort.
The main objections to DynamoDB and GraphQL
The main objection to DynamoDB is that it is hard to learn, but few people argue about its power. I agree the learning curve feels very steep. But SQL databases are not the best fit with serverless applications. Where do you stand up that SQL database? How do you manage connections to it? These things just don’t mesh with the serverless model very well. DynamoDB is serverless-friendly by design. You are trading the up-front pain of learning something hard to save yourself from future pain. Future pain that only grows if your application grows.
The case against using GraphQL with DynamoDB is a little more nuanced. GraphQL seems to fit well with relational databases partly because it is assumed by a lot of the documentation, tutorials, and examples. Alex Debrie is a DynamoDB expert who wrote The DynamoDB Book which is a great resource to deeply learn it. Even he recommends against using the two together, mostly because of the way that GraphQL resolvers are often written as sequential independent database calls that can result in excessive database reads.
Another potential problem is that DynamoDB works best when you know your access patterns beforehand. One of the strengths of GraphQL is that it can handle arbitrary queries more easily by design than REST. This is more of a problem with a public API where users can write arbitrary queries. In reality, GraphQL is often used for private APIs where you control both the client and the server. In this case, you know and can control the queries you run. With a GraphQL API it is possible to write queries that clobber any database without taking steps to avoid them.
A basic data model
For this example API, we will model an organization with teams, users, and certifications. The entity relational diagram is shown below. Each team has many users and each user can have many certifications.
Relational database model
Our end goal is to model this data in a DynamoDB table, but if we did model it in a SQL database, it would look like the following diagram:
To represent the many-to-many relationship of users to certifications, we add an intermediate table called “Credential.” The only unique attribute on this table is the expiration date. There would be other attributes for each of the tables, but we reduce it to just a name for each for simplicity.
Access patterns
The key to designing a data model for DynamoDB is to know your access patterns up front. In a relational database you start with normalized data and perform joins across the data to access it. DynamoDB does not have joins, so we build a data model that matches how we intend to access it. This is an iterative process. The goal is to identify the most frequent patterns to start. Most of these will directly map to a GraphQL query, but some may be only used internally to the back end to authenticate or check permissions, etc. An access pattern that is rarely used, like a check run once a week by an administrator, does not need to be designed. Something very inefficient (like a table scan) can handle these queries.
Most frequently accessed:
- User by ID or name
- Team by ID or name
- Certification by ID or name
Frequently accessed:
- All Users on a Team by Team ID
- All Certifications for a given User
- All Teams
- All Certifications
Rarely accessed
- All Certifications of users on a Team
- All Users who have a Certification
- All Users who have a Certification on a Team
DynamoDB single table design
DynamoDB does not have joins and you can only query based on the primary key or predefined indexes. There is no set schema for items imposed by the database, so many different types of items can be stored in a single table. In fact, the recommended best practice for your data schema is to store all items in a single table so that you can access related items together with a single query. Below is a single table model representing our data. To design this schema, you take the access patterns above and choose attributes for the keys and indexes that match.
The primary key here is a composite of the partition/hash key (pk) and the sort key (sk). To retrieve an item in DynamoDB, you must specify the partition key exactly and either a single value or a range of values for the sort key. This allows you to retrieve more than one item if they share a partition key. The indexes here are shown as gsi1pk, gsi1sk, etc. These generic attribute names are used for the indexes (i.e. gsi1pk) so that the same index can be used to access different types of items with different access pattern. With a composite key, the sort key cannot be empty, so we use “#” as a placeholder when the sort key is not needed.
Access pattern | Query conditions |
---|---|
Team, User, or Certification by ID | Primary Key, pk=”T#”+ID, sk=”#” |
Team, User, or Certification by name | Index GSI 1, gsi1pk=type, gsi1sk=name |
All Teams, Users, or Certifications | Index GSI 1, gsi1pk=type |
All Users on a Team by ID | Index GSI 2, gsi2pk=”T#”+teamID |
All Certifications for a User by ID | Primary Key, pk=”U#”+userID, sk=”C#”+certID |
All Users with a Certification by ID | Index GSI 1, gsi1pk=”C#”+certID, gsi1sk=”U#”+userID |
Database schema
We enforce the “database schema” in the application. The DynamoDB API is powerful, but also verbose and complicated. Many people jump directly to using an ORM to simplify it. Here, we will directly access the database using the helper functions below to create the schema for the Team
item.
const DB_MAP = {
TEAM: {
get: ({ teamId }) => ({
pk: 'T#'+teamId,
sk: '#',
}),
put: ({ teamId, teamName }) => ({
pk: 'T#'+teamId,
sk: '#',
gsi1pk: 'Team',
gsi1sk: teamName,
_tp: 'Team',
tn: teamName,
}),
parse: ({ pk, tn, _tp }) => {
if (_tp === 'Team') {
return {
id: pk.slice(2),
name: tn,
};
} else return null;
},
queryByName: ({ teamName }) => ({
IndexName: 'gsi1pk-gsi1sk-index',
ExpressionAttributeNames: { '#p': 'gsi1pk', '#s': 'gsi1sk' },
KeyConditionExpression: '#p = :p AND #s = :s',
ExpressionAttributeValues: { ':p': 'Team', ':s': teamName },
ScanIndexForward: true,
}),
queryAll: {
IndexName: 'gsi1pk-gsi1sk-index',
ExpressionAttributeNames: { '#p': 'gsi1pk' },
KeyConditionExpression: '#p = :p ',
ExpressionAttributeValues: { ':p': 'Team' },
ScanIndexForward: true,
},
},
parseList: (list, type) => {
if (Array.isArray(list)) {
return list.map(i => DB_MAP[type].parse(i));
}
if (Array.isArray(list.Items)) {
return list.Items.map(i => DB_MAP[type].parse(i));
}
},
};
To put a new team item in the database you call:
DB_MAP.TEAM.put({teamId:"t_01",teamName:"North Team"})
This forms the index and key values that are passed to the database API. The parse
method takes an item from the database and translates it back to the application model.
GraphQL schema
type Team {
id: ID!
name: String
members: [User]
}
type User {
id: ID!
name: String
team: Team
credentials: [Credential]
}
type Certification {
id: ID!
name: String
}
type Credential {
id: ID!
user: User
certification: Certification
expiration: String
}
type Query {
team(id: ID!): Team
teamByName(name: String!): [Team]
user(id: ID!): User
userByName(name: String!): [User]
certification(id: ID!): Certification
certificationByName(name: String!): [Certification]
allTeams: [Team]
allCertifications: [Certification]
allUsers: [User]
}
Bridging the gap between GraphQL and DynamoDB with resolvers
Resolvers are where a GraphQL query is executed. You can get a long way in GraphQL without ever writing a resolver. But to build our API, we’ll need to write some. For each query in the GraphQL schema above there is a root resolver below (only the team resolvers are shown here). This root resolver returns either a promise or an object with part of the query results.
If the query returns a Team
type as the result, then execution is passed down to the Team
type resolver. That resolver has a function for each of the values in a Team
. If there is no resolver for a given value (i.e. id
), it will look to see if the root resolver already passed it down.
A query takes four arguments. The first, called root
or parent
, is an object passed down from the resolver above with any partial results. The second, called args
, contains the arguments passed to the query. The third, called context
, can contain anything the application needs to resolve the query. In this case, we add a reference for the database to the context
. The final argument, called info
, is not used here. It contains more details about the query (like an abstract syntax tree).
In the resolvers below, ctx.db.singletable
is the reference to the DynamoDB table that contains all the data. The get
and query
methods directly execute against the database and the DB_MAP.TEAM....
translates the schema to the database using the helper functions we wrote earlier. The parse
method translates the data back to the from needed for the GraphQL schema.
const resolverMap = {
Query: {
team: (root, args, ctx, info) => {
return ctx.db.singletable.get(DB_MAP.TEAM.get({ teamId: args.id }))
.then(data => DB_MAP.TEAM.parse(data));
},
teamByName: (root, args, ctx, info) =>; {
return ctx.db.singletable
.query(DB_MAP.TEAM.queryByName({ teamName: args.name }))
.then(data => DB_MAP.parseList(data, 'TEAM'));
},
allTeams: (root, args, ctx, info) => {
return ctx.db.singletable.query(DB_MAP.TEAM.queryAll)
.then(data => DB_MAP.parseList(data, 'TEAM'));
},
},
Team: {
name: (root, _, ctx) => {
if (root.name) {
return root.name;
} else {
return ctx.db.singletable.get(DB_MAP.TEAM.get({ teamId: root.id }))
.then(data => DB_MAP.TEAM.parse(data).name);
}
},
members: (root, _, ctx) => {
return ctx.db.singletable
.query(DB_MAP.USER.queryByTeamId({ teamId: root.id }))
.then(data => DB_MAP.parseList(data, 'USER'));
},
},
User: {
name: (root, _, ctx) => {
if (root.name) {
return root.name;
} else {
return ctx.db.singletable.get(DB_MAP.USER.get({ userId: root.id }))
.then(data => DB_MAP.USER.parse(data).name);
}
},
credentials: (root, _, ctx) => {
return ctx.db.singletable
.query(DB_MAP.CREDENTIAL.queryByUserId({ userId: root.id }))
.then(data =>DB_MAP.parseList(data, 'CREDENTIAL'));
},
},
};
Now let’s follow the execution of the query below. First, the team
root resolver reads the team by id
and returns id
and name
. Then the Team
type resolver reads all the members of that team. Then the User
type resolver is called for each user to get all of their credentials and certifications. If there are five members on the team and each member has five credentials, that results in a total of seven reads for the database. You could argue that is too many. In a SQL database this might be reduced to four database calls. I would argue that the seven DynamoDB reads will be cheaper and faster than the four SQL reads in many cases. But this comes with a big dose of “it depends” on a lot of factors.
query { team( id:"t_01" ){
id
name
members{
id
name
credentials{
id
certification{
id
name
}
}
}
}}
Over-fetching and the N+1 problem
Optimizing a GraphQL API involves balancing a whole lot of tradeoffs that we won’t get into here. But two that weigh heavily in the decision of DynamoDB versus SQL are over-fetching and the N+1 problem. In many ways, these are opposite sides of the same coin. Over-fetching is when a resolver requests more data from the database than it needs to respond to the query. This often happens when you try to make one call to the database in the root resolver or a type resolver (e.g., members in the Team
type resolver above) to get as much of the data as you can. If the query did not request the name
attribute, it can be seen as wasted effort.
The N+1 problem is almost the opposite. If all the reads are pushed down to the lowest level resolver, then the team
root resolver and the members resolver (for Team
type) would make only a minimal or no request to the database. They would just pass the IDs down to the Team
type and User
type resolver. In this case, instead of members making one call to get all five members, it would push down to User
to make five separate reads. This would result in potentially 36 or more separate reads for the query above. In practice, this does not happen because an optimized server would use something like the DataLoader library that acts as a middleware to intercept those 36 calls and batch them into probably only four calls to the database. These smaller atomic read requests are needed so that the DataLoader (or similar tool) can efficiently batch them into fewer reads.
So, to optimize a GraphQL API with SQL, it is usually best to have small resolvers at the lowest levels and use something like DataLoader to optimize them. But for a DynamoDB API it is better to have “smarter” resolvers higher up that better match the access patterns your single table database it written for. The over-fetching that results in this case is usually the lesser of the two evils.
Deploy this example in 60 seconds
This is where you realize the full payoff of using DynamoDB together with serverless GraphQL. I built this example with Architect. It is an open-source tool to build serverless apps on AWS without most of the headaches of directly using AWS. Once you clone the repo and run npm install
, you can launch the app for local development (including a built-in local version of the database) with a single command. Not only that, you can also deploy it straight to production infrastructure (including DynamoDB) on AWS with a single command when you are ready.
Source: CSS-tricks.com