adrianhesketh.com

Single table pattern DynamoDB with Go - Part 1

This is part 1 of a 3 part series:

Introduction

The first time I took at look at DynamoDB, I could not understand why anyone would want to use it. I’d spent years using SQL Server, MySQL and MongoDB - databases with rich query capabilities. DynamoDB seemed grossly limited, difficult to understand, and oddly priced. In particular, the limitations on queries and the costs of reading every record in the database (scan operations) meant that I was concerned that I’d be unable to answer questions about data when I needed to by writing a reporting query.

However, since the introduction of automatic encryption at rest, pay-per-request pricing, DynamoDB streams and AWS Athena, it’s now become my preferred database for new solutions.

It’s still difficult to get started though. Introducing teams to DynamoDB can be a fraught experience. The familiarity of the “table” concept means that a lot of people start using it like they would in a relational database. This usually means having one table per entity, for example, a books table, and a categories table. However, this presents problems when trying to read the data back.

It doesn’t help that the various SDKs like the Java lead you down this path by using annotations on classes to mark which table the class represents and which properties are the hash and range key. The AWS AppSync system behaves in a similar way, each GraphQL resolver results in a new DynamoDB table.

Best practice use of DynamoDB typically uses a single table. It is explained in this article https://www.alexdebrie.com/posts/dynamodb-single-table/ (although I disagree with the section that argues that DynamoDB can’t be used effectively as a GraphQL backend, but that’s probably more about how flexible we each think a GraphQL endpoint should be).

Not only do new engineers have to work out how to build efficient designs for DynamoDB, but then they have to work out a good code structure for it. I thought I’d cover how I structure my code to make it easier to adopt the pattern.

As an example, I’ll model a permissions model based around Organisations. A User can register and create an Organisation. They can then invite other people to join the Organisation, and optionally assign them to be a member of a group within the Organisation. Inside each Organisation, admins can create Services and assign users to Groups.

This set of posts will provide a full example of a DynamoDB database layer written in Go that demonstrates how to:

Actions

Rather than starting with the data model, and relying on being able to write a complex query to retrieve the data, I start by thinking about the database actions I want to execute and plan the database structure around that.

These will likely change a little by the end of the project, but that’s OK. It’s more about having a good starting point.

Entities

From looking at the list of actions, I can see that the two top-level entities are the User entity, and the Organisation entity. My next step is to think about what data I’d like to store and retrieve using those actions and to define some types in my programming language to reflect those.

I’ve created a small User struct containing just the basic essentials, and a larger UserDetails struct that contains related data.

type User struct {
	// ID is the user's email address.
	ID         string
	FirstName  string
	LastName   string
	Phone      string
	CreatedAt  time.Time
}

type UserDetails struct {
	User
	Organisations []Organisation
	Invitations   []Invitation
}

type Invitation struct {
	Organisation Organisation
	InvitedAt    time.Time
	AcceptedAt   *time.Time
}

Then, I take the same approach for Organisation and OrganisationDetails. OrganisationDetails has a list of Group entities that can be used to map from group names into User entities, and a list of Service entities. Service entities also have a list of Group entities. This allows permissions to be set at both the Organisation level (e.g. owner / member), and at the Service level.

type Organisation struct {
	ID   string
	Name string
}

type OrganisationDetails struct {
	Organisation
	Groups   []Group
	Services []Service
}

type Service struct {
	ID     string
	Name   string
	Groups []Group
}

type Group struct {
	Name  string
	Users []User
}

Records

The underlying data storage will be handled using several types of database “Records”. There won’t be several tables, we’ll just use one DynamoDB table with several different types of record in it.

The “Entities” above don’t map exactly to how the data is stored in the database, because while they’re well structured to be returned as part of an API, they don’t necessarily match the underlying storage structure for the data. I don’t want to expose the internals of the database structure to my API users so I’ll create some private record types.

First, I plan out what the records will be. The main requirement for a record in a DynamoDB table is that it must define a partition key, and a sort key. I like to think of the partition key as being used to select the server cluster used to store a block of data, and the sort key as something that gets used to find data within that block. A more thorough description is in this blog post - https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/

One-to-many relationships

In DynamoDB, relationships such as one-to-many relationships are created by using the same partition key for the related data. Data is grouped under a partition key, then can be filtered and sorted by using the sort keys. To differentiate the multiple records within a single table, the partition and sort keys can include information about the type of record they are, but I also like to add an attribute to the record for this purpose.

To add a user to an organisation, we can just Put a userOrganisationRecord into the database without even checking if the user exists. It’s possible to do that check using DynamoDB transactions, but in this case, it’s not required.

To get, delete or update a single record, we need both its partition key and its sort key.

To retrieve a User, we can execute a DynamoDB Get operation to find a User with email test@example.com with the partition key user/test@example.com and the sort key user, but this won’t include the other records that would be needed to fully populate UserDetails - only the basic User information would be retrieved.

This is because UserDetails is represented by multiple records - a single userRecord, and multiple userOrganisationRecords that define the related organisations, so to bring back all of the data required to populate UserDetails, a DynamoDB Query operation must be used to bring back both the userRecord and the userOrganisationRecords in a single database operation. To query data, we need the partition key and can then either retrieve all of the data under the partition key, or can apply filtering to reduce the amount of data returned. It’s important to be aware of the DynamoDB limits on how much data is returned in a single operation (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html) and to use the QueryPages function if more than one page of data is required.

User record structure

Denormalisation as normal

In the UI, once we have the UserDetails, we can display a list of the Organisations to the user for them to select one to view / edit.

Rather than just storing the ID of the Organisation, the organisation name is included in the attribute list. This prevents having to make another round-trip to the database to retrieve organisation details. However, it also means that if the organisationName changes on the organisationRecord, then the userOrganisationRecord will be out of date.

There are several approaches here, one is to proactively change the userOrganisationRecord in your code (maybe even within a transaction) to keep everything 100% in sync (most expensive), the other is to wait until the user views the Organisation in the future and update the names if they’ve changed (cheapest, in terms of cost and operations), or to use DynamoDB Streams to trigger when records change and apply them asynchronously (might be easiest to code, because you only have to think about data synchronisation in one place). The correct decision will depend on how complex the data model is going to get, and how important it is for data changes to be immediately reflected.

Partition key selection

Given that all of the data for each organisation should be kept separate from others, it’s a good choice for being the partition key, using the sort key to organise data below it.

In a relational database, you might use a WHERE clause with some joins to limit the data returned by a query to what’s relevant for the user. However, in this design, we’ll get the record from the database, then check whether the user has access or not before displaying it to the user.

While a transaction could be used to check that a user has permission to access a record within DynamoDB, it would cost more to do this than simply checking the permissions once the data has been retrieved, because transactions use twice as many read units as consistent reads. It doesn’t make sense to optmimise for the least common case.

Users that have Organisation level membership are stored in organisationMemberRecords under the organisation’s partition key, just like the userOrganisationRecord above. To add a user to an organisation group requires inserting two records - a userOrganisationRecord and an organisationMemberRecord. These can be done in a single database API call using the BatchPutItem API call. If it’s really important that both writes succeed, then it’s also possible to use a DynamoDB transaction to make sure that either both records get inserted successfully, or none of the records are inserted if any of the operations fail.

Organisation record structure

The organisationMemberRecord includes a cache of user details. If the user record changes, then the data stored within the organisationMemberRecord will be stale.

For the use case I have in mind for this database, I don’t expect users to change their details very frequently at all - so I’m only going to update the cached data when the system needs to contact a User by sending them an email. This risks someone seeing stale data, but for the use case I have in mind, low cost is important. If it becomes a problem, I’ll deal with it later.

Each Organisation has multiple Services. This is achieved by a similar design to the User - the group membership record uses the same partition key as the group that it belongs to, with each record differentiated by having a sort key prefix that represents the type of data being stored. In this case, it’s just storing the name of the service. Assigment to a group within the service is done at the organisationMemberRecord.

Next

In Part 2 - Record design, I’ll start turning these designs into code.