adrianhesketh.com

Single table pattern DynamoDB with Go - Part 2

This is part 2 of a 3 part series:

Introduction

In Part 1 - Database design, I put together a design for an Organisation membership structure in DynamoDB, defining the Entities that represent Organisations and Users, and the Records that will be placed in the database.

In this post, I start to turn the design into code by creating types to represent the record structures.

Standard fields

I start by creating a type for all of the standard fields that will be in the DynamoDB record.

The standard fields are the keys (hash key and range key). I also add the name of the record type so that after the data is inserted into DynamoDB, it’s easy to work out what programming language type it is.

Adding a version number is useful if I change the structure of a type later - then I’ll know what the structure of the record should be when I read it back, or I can proactively modify old data to match the new format by scanning through the database, updating each of the old versions to match the new format.

To name the fields in DynamoDB, I tend to use the JSON tag struct to set the field names, because the DynamoDB library respects that anyway and it’s shorter than writing dynamodbav. (https://docs.aws.amazon.com/sdk-for-go/api/service/dynamodb/dynamodbattribute/#Marshal) - it’s not essential to do this, I could just leave them to be the same casing as the Go names, but since I look at DynamoDB records most often in JSON format, it looks most consistent to me.

type record struct {
	ID         string `json:"id"`
	Range      string `json:"rng"`
	RecordType string `json:"typ"`
	Version    int    `json:"v"`
}

Record code

Now, time to create the records. I create:

The constant to store the name ensures that the correct name is used thoughout the code. Although it’s not really required (I could just use the name (e.g. “user” / “organisation”) everywhere it’s needed, it’s useful later when I need to compare the record I’ve just pulled from the database to the record types I have in my code, or for when it reaches a BI platform.

The hash key function creates the value for the record’s ID (partition key) field, and the range key function does the same for the range (sort key).

I have a simple rule:

Anything that’s in the hash or range key string should also be in the attributes of the DynamoDB record.

If you don’t do this, then you end up trying to parse the key to extract data from it. If the keys contain user input, then you’re potentially at risk of an injection attack. For example, if you use the / character as a separator, then you may need to make sure that the strings that make up the key don’t contain that character. I would probably use url path escaping to do that, but it’s a complexity I’d only take on if the volume of the data was incredibly high and I needed to save the few bytes per record to reduce my storage costs.

I also like to add a function that creates a record from the values of the entity, e.g. (newUserRecord converts a User entity into its DynamoDB record representation).

const userRecordName = "user"

func newUserRecord(user User) userRecord {
	var ur userRecord
	ur.ID = newUserRecordHashKey(user.ID)
	ur.Range = newUserRecordRangeKey()
	ur.Email = user.ID
	ur.FirstName = user.FirstName
	ur.LastName = user.LastName
	ur.Phone = user.Phone
	ur.CreatedAt = user.CreatedAt
	return ur
}

func newUserRecordHashKey(email string) string {
	return userRecordName + "/" + email
}

func newUserRecordRangeKey() string {
	return userRecordName
}

type userRecord struct {
	record
	userRecordFields
}

type userRecordFields struct {
	Email     string    `json:"email"`
	FirstName string    `json:"firstName"`
	LastName  string    `json:"lastName"`
	Phone     string    `json:"phone"`
	CreatedAt time.Time `json:"createdAt"`
}

The same pattern repeats across the other record types. I tend to put the record types in the same file as the Store that uses them the most.

Local development setup

When I’m working with DynamoDB, I like to use a local DynamoDB to get started for convenience. The usage instructions are severely offputting, you basically download a JAR file, make sure you’ve got Java installed and running a command line (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.DownloadingAndRunning.html).

It makes it feel very “unfinished” to people that are using it for the first time. I prefer to use the Docker version, which is still less than ideal as a local experience - https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.Docker.html

To run the Docker image, the instructions tell you to run docker run -p 8000:8000 amazon/dynamodb-local, but that’s not really what you want to do, because it keeps the database in memory trashing everything when you close it down.

My command line looks like this:

docker run -p 8000:8000 -v $(pwd)/dbstore:/dbstore amazon/dynamodb-local -jar DynamoDBLocal.jar -sharedDb -dbPath /dbstore

DynamoDB GUI

Not that it tells you in the console output (maybe it should?), but with it running, you can then connect to the DynamoDB shell at http://localhost:8000/shell - However, as of April 2020, that’s the most awful user interface in the world, so don’t do that.

The DynamoDB shell is bad for a few reasons. The interface itself is poorly organised, the workflow is confusing at best, and it relies on knowledge of old-fashioned callback JavaScript (no async/await etc.). It just puts new developers off, and if you do manage to make a complex query work in the shell, and you’re not using Node.js, you have to work out how to convert what you’ve just written into your programming language which is frustrating.

My strategy of working with DynamoDB is to use my own code to insert and retrieve data, and only use a GUI to check that it looks OK. Personaly, I use https://github.com/Arattian/DynamoDb-GUI-Client - it’s free, cross-platform, works well and looks good. The downside is that it’s an Electron app, so it uses 150MB of RAM just to exist, the UI isn’t keyboard friendly and it shows records in a table (as does the DynamoDB Console), meaning that a lot of horizontal space is wasted.

If I was working at AWS, I’d be asking to spend some time improving the DynamoDB developer experience at all levels. The SDKs, the local environment, the shell, and the tools that use it.

Stores

With the records in place, and a local database to test against. It’s time to start being able to store some data.

I like to create a package called db and create a “Store” for each top level element. I start with a Put operation and a Get operation to make sure I can store and retrive data as expected.

This starts by creating a type that contains a dynamodb.Client and the name of the table. I usually include a configurable time provider, so that I can switch it out easily during unit tests, i.e. mock the time function so that it returns a fixed time.Time so that the ouptut is predictable / fixed for tests.

// NewUserStore creates a new UserStore.
func NewUserStore(region, tableName string) (us UserStore, err error) {
	sess, err := session.NewSession(&aws.Config{Region: aws.String(region)})
	if err != nil {
		return
	}
	us.Client = dynamodb.New(sess)
	us.TableName = aws.String(tableName)
	us.Now = func() time.Time {
		return time.Now().UTC()
	}
	return
}

// UserStore stores User records in DynamoDB.
type UserStore struct {
	Client    *dynamodb.DynamoDB
	TableName *string
	Now       func() time.Time
}

Now, I need a Put operation. This will upsert the record, overwriting any existing data.

func (store UserStore) Put(user User) error {
	ur := newUserRecord(user)
	item, err := dynamodbattribute.MarshalMap(ur)
	if err != nil {
		return err
	}
	_, err = store.Client.PutItem(&dynamodb.PutItemInput{
		TableName:           store.TableName,
		Item:                item,
	})
	return err
}

Next, a simple Get operation. To map from the userRecord to the User entity, I’ve added a newUserFromRecord function.

func newUserFromRecord(ur userRecord) User {
	return User{
		ID:        ur.Email,
		FirstName: ur.FirstName,
		LastName:  ur.LastName,
		Phone:     ur.Phone,
		CreatedAt: ur.CreatedAt,
	}
}

To reduce the amount of typing, there are automated tools that map between types such as https://github.com/imdario/mergo but I have bad memories of runtime issues with tools like AutoMapper on C#, so I prefer to take the basic approach and test the mappings with unit tests.

The Get operation is quite simple, but the GetItem method on the DynamoDB client requires a really unweildy way to define the key, so I usually create a single function to handle it. The DynamoDB key is a map (dictionary) of string keys (names) to *dynamodb.AttributeValue values. The dynamodb.AttributeValue type has fields such as S that must be populated when the value is a string.

func idAndRng(id, rng string) map[string]*dynamodb.AttributeValue {
	return map[string]*dynamodb.AttributeValue{
		"id":  {S: aws.String(id)},
		"rng": {S: aws.String(rng)},
	}
}

With that function out of the way, the Get method looks OK.

func (store UserStore) Get(id string) (user User, err error) {
	gio, err := store.Client.GetItem(&dynamodb.GetItemInput{
		TableName:      store.TableName,
		ConsistentRead: aws.Bool(true),
		Key:            idAndRng(newUserRecordHashKey(id), newUserRecordRangeKey()),
	})
	if err != nil {
		return
	}
	var record userRecord
	err = dynamodbattribute.UnmarshalMap(gio.Item, &record)
	user = newUserFromRecord(record)
	return
}

Now, to test it…

Tests

For database operations, I don’t think there’s any real substitute for integration tests but for speed, and to be able to work offline, I carry out integration tests against a local DynamoDB. To allow each test to stand alone, I create a new DynamoDB table each time, then tear it down.

The various elements that could be tested independently are:

Database behaviour will cover the other two areas, so I’ll typically only write tests for those areas if I’m debugging an issue or having trouble writing the logic and need to work something out.

In my intergration tests, I use a couple of functions to create a table, and destroy it. Note the use of the client.Endpoint field to instruct the DynamoDB client to use DynamoDB local (assuming you’re running it on the default port):

const region = "eu-west-1"

func createLocalTable(t *testing.T) (name string) {
	sess, err := session.NewSession(&aws.Config{Region: aws.String(region)})
	if err != nil {
		t.Fatalf("failed to create test db session: %v", err)
		return
	}
	name = uuid.New().String()
	client := dynamodb.New(sess)
	client.Endpoint = "http://localhost:8000"
	_, err = client.CreateTable(&dynamodb.CreateTableInput{
		AttributeDefinitions: []*dynamodb.AttributeDefinition{
			{
				AttributeName: aws.String("id"),
				AttributeType: aws.String("S"),
			},
			{
				AttributeName: aws.String("rng"),
				AttributeType: aws.String("S"),
			},
		},
		KeySchema: []*dynamodb.KeySchemaElement{
			{
				AttributeName: aws.String("id"),
				KeyType:       aws.String("HASH"),
			},
			{
				AttributeName: aws.String("rng"),
				KeyType:       aws.String("RANGE"),
			},
		},
		BillingMode: aws.String(dynamodb.BillingModePayPerRequest),
		TableName:   aws.String(name),
	})
	if err != nil {
		t.Fatalf("failed to create local table: %v", err)
	}
	return
}

The delete table function is simpler, because there’s less to define.

func deleteLocalTable(t *testing.T, name string) {
	sess, err := session.NewSession(&aws.Config{Region: aws.String(region)})
	if err != nil {
		return
	}
	client := dynamodb.New(sess)
	client.Endpoint = "http://localhost:8000"
	_, err = client.DeleteTable(&dynamodb.DeleteTableInput{
		TableName: aws.String(name),
	})
	if err != nil {
		t.Fatalf("failed to delete table: %v", err)
	}
}

With that in place, it’s possible to carry out an integration test. I end the test names with Integration to show that they make network calls, and also use the testing.Short() method in the test framework to prevent the tests from running when I save my files in my editor, because I usually have “test on save” and “cover on save” enabled.

To make sure that the test cleans up after itself, and deletes the table, I defer the deletion of the local table so that it runs at the end of the test.

func TestUserPutIntegration(t *testing.T) {
	if testing.Short() {
		t.Skip("skipping integration test")
	}
	name := createLocalTable(t)
	defer deleteLocalTable(t, name)
	s, err := NewUserStore(region, name)
	s.Client.Endpoint = "http://localhost:8000"
	if err != nil {
		t.Errorf("failed to create store: %v", err)
	}
	u := User{
		ID:        "test@example.com",
		FirstName: "Sarah",
		LastName:  "Connor",
		CreatedAt: time.Date(2020, time.January, 1, 0, 0, 0, 0, time.UTC),
		Phone:     "4476123456789",
	}
	err = s.Put(u)
	if err != nil {
		t.Errorf("failed to create user: %v", err)
	}
}

It’s no good testing the Put if we can’t get the information back out again, so the next test tests both.

func TestUserGetIntegration(t *testing.T) {
	if testing.Short() {
		t.Skip("skipping integration test")
	}
	name := createLocalTable(t)
	defer deleteLocalTable(t, name)
	s, err := NewUserStore(region, name)
	s.Client.Endpoint = "http://localhost:8000"
	if err != nil {
		t.Errorf("failed to create store: %v", err)
	}
	expected := User{
		ID:        "test@example.com",
		FirstName: "Sarah",
		LastName:  "Connor",
		CreatedAt: time.Date(2020, time.January, 1, 0, 0, 0, 0, time.UTC),
		Phone:     "4476123456789",
	}
	err = s.Put(expected)
	if err != nil {
		t.Errorf("failed to put user: %v", err)
	}
	actual, err := s.Get("test@example.com")
	if err != nil {
		t.Errorf("failed to get user: %v", err)
	}
	if diff := cmp.Diff(expected, actual); diff != "" {
		t.Error(diff)
	}
}

If I find that I want to see what the database ended up as, I can comment out the defer deleteLocalTable statement and run just the test I want (e.g. go test -run testname). This results in a new table in my test database server that I can inspect.

There are a lot more tests in the repository over at https://github.com/a-h/organisation

Next

In Part 3 - Store design, I cover more advanced data access, including use of QueryPages, BatchWriteItem and DynamoDB transactions.