adrianhesketh.com

Launch a Gemini capsule on AWS with the CDK

I got into the Gemini protocol [0] last year and ended up writing a server [1] and a browser [2] in Go.

But… I didn’t start hosting my blog using Gemini for a while because of the work involved in migrating the content over, and setting everything up. Last week, I got around to doing it, and I’ve made some scripts that you might find helpful to do the same.

Migrating from Markdown to Gemini

On HTTP, my blog is currently built using Hugo [3], a static site generator. I write the content in markdown (in vim) and use a Hugo template I wrote (called Brutalism, after the architectural style) to create the index page, turn the markdown files into HTML, add headers/footers, a bit of CSS, a lightbox for images, and syntax highling for code.

I decided that the way to go was to convert everything to the text/gemini format, and to write directly in text/gemini format from now on, and write a simple converter to turn that to Hugo style markdown.

I only use a few features of markdown, like paragraphs, images and tables, so I was able to write a fairly simple script to help me migrate everything across, with a few manual tweaks on the source markdown. The most complex thing to do was to write something to pull URLs out of paragraphs, leave markers behind (e.g. [1]) and drop them into Gemini’s link lines. Of course, after I’d written it and done everything, I found a tool [4] that looks like it does exactly what I needed, maybe I could have saved some time!

Hugo’s great, because it handles things like building sitemaps, so I’ll probably stick with it. The process will be that I’ll write a generator from Gemini to Hugo markdown, but I am thinking about building a tiny generator, because I think it might be quicker for me to do that, than learn to use more Hugo features. For now, I’m just manually converting Gemini to Markdown, because it’s really easy to insert a couple of images, and update the link format.

I got my script to migrate the markdown files to Gemini in a simple target folder structure. Each blog post is placed into its own directory, and it has a metadata file that basically has the “tags” that the blog post is under, it’s title, and the exact time that the post was published to allow the posts to be ordered in an index.

/year/month/day/blog-post-name/
/year/month/day/blog-post-name/index.gmi
/year/month/day/blog-post-name/metdata.json
/year/month/day/blog-post-name/related-image.png

Once I had everything in a sensible directory structure, I wrote a script to “build” the output [5]. It searches for all the posts, loads in the metadata, and creates an output directory containing each post, but with a header and footer added, and an index file created to show all the posts.

The only clever bit at the moment is that the script adds a “previous” and “next” link at the bottom of each post. I might add a “related” feature that uses the tags to find content with the same tags at some point.

Choosing a hosting provider

I wanted to find the cheapest, easiest way to run my Gemini server on the public Internet.

Last year, I tried out a bunch of services I’d never tried out, that are designed to host Docker containers, including Google’s cloud.run [6], heroku [7] fly.io [8], and AWS LightSail [9].

Heroku and cloud.run didn’t seem to support having a raw socket on the Internet, they seemed focussed on having web traffic, and having a load balancer in front, which wouldn’t work well with the client certificates and TLS handshake in my Gemini server.

Fly.io does support having a socket, and it’s has to be slickest experience for running an app I’ve had in years, but for some reason, it has a restricted set of ports [10] that doesn’t include port 1965 for Gemini. Shame, because I’d be able to run my Gemini capsule on the free tier.

I use AWS for my client work, but the vast majority of it is “serverless”, where I offload TLS, compute management and other details to AWS services like API Gateway, CloudFront, Lambda and S3, but but there’s no Gemini gateway (yet?), so I had to look for different solutions.

With client work, I can’t think of a time where I haven’t used an AWS load balancer or API Gateway as the entrypoint to applications. I’d normally go for a best practice design - public subnets containing a managed load balancer and NAT gateways for outbound Internet access, with private subnets for instances connected up to an autoscaling group for fault tolerance.

However, running a load balancer and multiple NAT gateways isn’t free, so it would bump the cost up to more than $50 a month, which I think is a silly amount to spend running a Gemini capsule. My HTTP blog is a static site on CloudFront and S3, with Lambda@Edge to do a few rewrites, it costs less than $3 a month to run.

The relatively new AWS LightSail service seems to be aimed at the hobbyist crowd, and so it supports skipping some of the expensive (but more secure and fault tolerant) parts of the design, but it didn’t have the slick CLI experience of fly.io, it was more focussed on clicking around in a website. In terms of price, AWS LightSail would be $3.50 a month to run a raw server, running Docker containers inexplicably costs twice as much - $7 per month.

AWS LightSail would definitely get the job done, but since I’m using my old friend AWS, I don’t want to just run any old server, I want to run a Graviton [11] processor, and LightSail doesn’t support that yet. Graviton is AWS’s custom chip design, based on ARM64. ARM processors are known for using less power than x64 processors, and AWS also charges less per hour for running a Graviton instance. I like both of those things.

At the time of writing, in eu-west-1, t4g (g for Graviton) is the cheapest instance you can run. The next cheapest is the t3a (a for AMD) which uses an AMD Epyc processor, then it’s the Intel variant.

(((0.0046*24)*365)/12) = $3.358 per month

Excellent, a saving of $0.14 per month over LightSail if I run it myself. If you don’t know much about AWS, and you don’t really want to, then I think you’re probably better off spending the extra $0.14 than following what I do…

		vCPU 	Memory		Cost
t4g.nano	2	0.5 GiB		$0.0046
t3a.nano	2	0.5 GiB		$0.0051
t3.nano		2	0.5 GiB		$0.0057
t2.nano		1	0.5 GiB		$0.0063
t4g.micro	2	1 GiB		$0.0092
t3a.micro	2	1 GiB		$0.0102
t3.micro	2	1 GiB		$0.0114

Since I use AWS all the time, I’m really familiar with the tooling and everything else, so I decided to use the AWS CDK [12] to build everything I needed.

AWS design

First, I planned out what I’d need:

  • A “VPC” - a network to run my instance in. The network will be made up of public and private subnets. I’ll put my instance in the public subnet so that it can be accessed by its IP address on the Internet (subject to firewall rules).
  • An “Elastic IP” - a static IP address that I can assign to my virtual machine.
  • An “EC2 instance” - a virtual machine instance to run my server software.
  • An “Instance Role” - to grant permissions to my server to allow it to read from the bucket, and for the server to be managed remotely via the AWS management tools without having to enable SSH access.
  • A “Security Group” - firewall rules to enable network access to the instance, 1965 for Gemini.
  • An “S3 Bucket” - a cloud storage folder to backup my Gemini content and the TLS certificates for the server.
  • A DNS “A” record in “Route 53” to connect the “capsule.adrianhesketh.com” domain name to the IP address of my server.
  • A “User Data” script - a bash script to when my server starts up to configure it. This means I don’t need to log on to the server to configure it, and if I need to replace it, the script will just run again to configure it in the same way as it was.

AWS CDK setup

With the rough design in place, I could start building it out with CDK.

CDK lets you write your infrastructure design out in a programming language. It then converts your code into the AWS configuration format (CloudFormation) and pushes it to AWS where the configuration is executed - turning the configuration into “resources” like virtual machines, load balancers etc.

Resources are created by importing the libraries that create them, and configuring the fields. CDK has sensible defaults for most fields, so you usually find that you don’t need to configure very much.

S3 bucket

The default for S3 buckets is unencrypted, and I want to make sure that in a moment of madness, I don’t make the bucket public, because it contains the private keys for the server, so I’ve added a few fields to block public access. Finally, I’ve made it so that I keep old versions for a few days in case I make a mistake and overwrite something I want to keep.

const bucket = new s3.Bucket(this, "content", {
  encryption: s3.BucketEncryption.S3_MANAGED,
  lifecycleRules: [
    { abortIncompleteMultipartUploadAfter: cdk.Duration.days(7) },
    { noncurrentVersionExpiration: cdk.Duration.days(7) },
  ],
  blockPublicAccess: {
    blockPublicAcls: true,
    blockPublicPolicy: true,
    ignorePublicAcls: true,
    restrictPublicBuckets: true,
  },
  versioned: true,
});

The very first time that the process runs, I want to ensure that the server has some certificates to use, so I included a startup script to copy keys from a “keys” directory up to a “keys” directory in the S3 bucket as part of the deployment. After the first deployment, you can delete the local keys and remove this section if you like, because they’re backed up in the S3 bucket. It’s only if you delete the bucket that you’ll lose the keys.

new s3Deployment.BucketDeployment(this, "keysDeployment", {
  sources: [s3Deployment.Source.asset(path.join(__dirname, "../keys"))],
  destinationKeyPrefix: "keys",
  destinationBucket: bucket,
});

It’s possible to use a feature called SSM Parameter Store which is designed store secrets like keys, but I’m happy to store them in this bucket for this project.

I’ve designed the scripts to expect that Gemini content is stored in the bucket inside a “content” directory in the S3 bucket. The CDK project prints out the name of the S3 bucket to make it easy to copy the data up.

new cdk.CfnOutput(this, "S3_CONTENT_LOCATION", {
  value: `s3://${bucket.bucketName}/content`,
});
new cdk.CfnOutput(this, "S3_KEYS_LOCATION", {
  value: `s3://${bucket.bucketName}/keys`,
});

To copy new content up to your capsule you can use the aws s3 snyc command. Be careful though, the command will delete files in the bucket if they’re not present in the local directory.

aws s3 sync ./content s3://BUCKET/content

Network

With the content bucket done, it’s time to deploy the network. The default network is totally sensible, and exactly what I’d have written myself. It creates a network that spans multiple groups of data centres (“availability zones”) in a region, and has both a public and private subnet in each availability zone, a NAT gateway in the public subnet of each availability zone to allow anything in the private subnet to make outbound requests to the Internet, and a network routing table to route the traffic.

It’s amazing when you think about it. I can launch a well designed network across multiple data centres with just a line of code.

However, I don’t want to spend money on NAT gateways, so I just set it to zero to save the cash.

const vpc = new ec2.Vpc(this, "VPC", {
  natGateways: 0,
});

Time to create a set of firewall rules to allow traffic in to my Gemini server. I just need to enable inbound TCP on port 1965, the firewall is stateful, so it allows traffic to be returned automatically.

const allowInboundGeminiSG = new ec2.SecurityGroup(
  this,
  "allowInboundGemini",
  { vpc: vpc, description: "Allow inbound Gemini on port 1965" }
);
allowInboundGeminiSG.addIngressRule(
  ec2.Peer.anyIpv6(),
  ec2.Port.tcp(1965),
  "Gemini"
);
allowInboundGeminiSG.addIngressRule(
  ec2.Peer.anyIpv4(),
  ec2.Port.tcp(1965),
  "Gemini"
);

These resources will be used a bit later.

Instance

Time to start the instance. First, I’ll configure a script to run on the box at startup. It loads in the user-data.sh file.

const userData = fs
  .readFileSync(path.join(__dirname, "./user-data.sh"), "utf8")
  .replace(/\$BUCKET/g, bucket.bucketName)
  .replace(/\$DOMAIN/g, domainName);

The instance is a bit more complicated, but not much. The machineImage field sets which type of operating system I want to use. Because I’ve set the T4G instance class (G for Graviton, AWS’s custom ARM chip), I need to use an ARM64 operating system.

The instanceType is the size and type of the virtual machine.

The vpc and the vpcSubnets fields set where the machine is hosted. I’ve set it to be part of my custom VPC, and to sit inside the public subnet.

Next, I set the userData field, and set the size of the disk to 8GB, which is plenty for this. It’s set to delete when the instance is deleted, but that’s OK, because the point is to put everything into the S3 bucket, then pull it down into the instance to serve it.

const instance = new ec2.Instance(this, "geminiInstance", {
  machineImage: new ec2.AmazonLinuxImage({
    generation: ec2.AmazonLinuxGeneration.AMAZON_LINUX_2,
    cpuType: ec2.AmazonLinuxCpuType.ARM_64,
  }),
  instanceType: ec2.InstanceType.of(
    ec2.InstanceClass.T4G,
    ec2.InstanceSize.NANO
  ), // ARM processor, 2 vCPU and 0.5GB of RAM.
  vpc: vpc,
  vpcSubnets: {
    subnetType: ec2.SubnetType.PUBLIC,
  },
  userData: ec2.UserData.custom(userData),
  blockDevices: [
    {
      deviceName: "/dev/sda1",
      volume: {
        ebsDevice: {
          volumeType: ec2.EbsDeviceVolumeType.GENERAL_PURPOSE_SSD,
          deleteOnTermination: true,
          volumeSize: 8,
        },
      },
    },
  ],
  userDataCausesReplacement: true, // If the script changes, the instance will be destroyed and recreated.
});

Next, it’s time to give the instance access to do stuff.

// Enable the instance to be managed via Amazon's SSM product.
instance.role.addManagedPolicy(
  iam.ManagedPolicy.fromAwsManagedPolicyName("AmazonSSMManagedInstanceCore")
);
// Enable CloudWatch Agent (https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/install-CloudWatch-Agent-on-EC2-Instance.html).
instance.role.addManagedPolicy(
  iam.ManagedPolicy.fromAwsManagedPolicyName("CloudWatchAgentServerPolicy")
);
// Allow the instance to read content from the S3 bucket.
bucket.grantRead(instance.role);
// Allow the instance to be connected to via Gemini protocol.
instance.addSecurityGroup(allowInboundGeminiSG);
// Print out the instance ID so that we can connect to it.
new cdk.CfnOutput(this, "INSTANCE_ID", {
  value: instance.instanceId,
});
new cdk.CfnOutput(this, "INSTANCE_IP", {
  value: instance.instance.attrPublicIp,
});

User data

The user data script sets up the Linux operating system to run the Gemini server, it:

  • Installs the AWS SSM agement to enable remote login without needing SSH ports to be open on the instance.
  • Pushes the Gemini logs to AWS CloudWatch so that Gemini logs and metrics (CPU, disk space etc.) can be retained and analysed within AWS CloudWatch.
  • Installs the ARM64 build of the github.com/a-h/gemini server and sets up the log and content directory.
  • Downloads the server TLS keys and content from the S3 bucket.
  • Configure the Gemini server to start automatically at boot.
  • Sets up logrotate to delete logs files after a few days to make sure the disk doesn’t fill up.
#!/bin/sh
# Install SSM agent to be able to log in remotely.
sudo yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_arm64/amazon-ssm-agent.rpm
restart amazon-ssm-agent

# Configure AWS Logs to push to CloudWatch.
sudo yum install -y amazon-cloudwatch-agent
sudo tee /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json > /dev/null << 'EOF'
{
        "agent": {
                "metrics_collection_interval": 60,
                "run_as_user": "root"
        },
        "logs": {
                "logs_collected": {
                        "files": {
                                "collect_list": [
                                        {
                                                "file_path": "/var/log/geminid/log.txt",
                                                "log_group_name": "geminid",
                                                "log_stream_name": "{instance_id}"
                                        }
                                ]
                        }
                }
        },
        "metrics": {
                "append_dimensions": {
                        "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
                        "ImageId": "${aws:ImageId}",
                        "InstanceId": "${aws:InstanceId}",
                        "InstanceType": "${aws:InstanceType}"
                },
                "metrics_collected": {
                        "cpu": {
                                "measurement": [
                                        "cpu_usage_idle",
                                        "cpu_usage_iowait",
                                        "cpu_usage_user",
                                        "cpu_usage_system"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ],
                                "totalcpu": false
                        },
                        "disk": {
                                "measurement": [
                                        "used_percent",
                                        "inodes_free"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "diskio": {
                                "measurement": [
                                        "io_time"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "mem": {
                                "measurement": [
                                        "mem_used_percent"
                                ],
                                "metrics_collection_interval": 60
                        },
                        "swap": {
                                "measurement": [
                                        "swap_used_percent"
                                ],
                                "metrics_collection_interval": 60
                        }
                }
        }
}
EOF
sudo systemctl enable amazon-cloudwatch-agent.service
sudo systemctl start amazon-cloudwatch-agent.service

# Create user.
sudo useradd geminid

# Start installation.
cd ~

# Install github.com/a-h/gemini server.
wget https://github.com/a-h/gemini/releases/download/v0.0.49/gemini_0.0.49_Linux_arm64.tar.gz
tar -xf gemini_0.0.49_Linux_arm64.tar.gz
sudo mv gemini /usr/bin/

# Create a log directory.
sudo mkdir -p /var/log/geminid
sudo chown geminid:geminid /var/log/geminid

# Create a content directory.
sudo mkdir -p /srv/gemini
sudo chown geminid:geminid /srv/gemini

# Create a config directory.
sudo mkdir -p /etc/gemini
sudo chown geminid:geminid /etc/gemini

# Download the server keys.
sudo aws s3 sync s3://$BUCKET/keys /etc/gemini

# Download the initial content.
sudo aws s3 sync s3://$BUCKET/content /srv/gemini

# Create the geminid systemd service.
# Note that it redirects the log output to /var/log/geminid/log.txt
# Later versions of systemd support appending to files, but at the time of writing, Amazon Linux 2 shipped with version 219 of systemd.
sudo tee /etc/systemd/system/geminid.service > /dev/null << 'EOF'
[Unit]
Description=geminid

[Service]
User=geminid
Group=geminid
Type=simple
Restart=always
WorkingDirectory=/srv/gemini
ExecStart=/bin/sh -c '/usr/bin/gemini serve --domain=$DOMAIN --certFile=/etc/gemini/server.crt --keyFile=/etc/gemini/server.key --path=/srv/gemini >> /var/log/geminid/log.txt 2>&1'
EOF

# Start and enable geminid on startup.
sudo systemctl start geminid
sudo systemctl enable geminid

# Rotate the geminid logs.
sudo tee /etc/logrotate.d/geminid > /dev/null << 'EOF'
/var/log/geminid/*.txt {
        daily
        copytruncate
        missingok
        rotate 7
        notifempty
}
EOF

DNS

With an instance ready, I wanted to connect a DNS record up to allow my Gemini server to be accessed via capsule.adrianhesketh.com.

First, I created a static IP address to make sure we use the same IP address every time the server is destroyed and recreated. This avoids having to update DNS each time and risk outages.

const elasticIp = new ec2.CfnEIP(this, "elasticIp", {
  domain: "vpc",
  instanceId: instance.instanceId,
});

I already had a DNS name set up for my domain “adrianhesketh.com”, so I look up the “zone”:

const zoneName = "your_domain.com";
const hostedZoneId = "your_id";
const zone = route53.HostedZone.fromHostedZoneAttributes(
  this,
  "hostedZone",
  { zoneName, hostedZoneId }
);

Then add an A record on the subdomain to point at the elastic IP, that’s attached to the server.

new route53.ARecord(this, "ARecord", {
  zone: zone,
  recordName: "capsule.yourdomain.com",
  target: route53.RecordTarget.fromIpAddresses(elasticIp.ref),
});

With those parts in place, cdk deploy deploys the stack out and sets everything up.

Deploying new content

Once the infrastructure is in place, putting new content on the server can be done in two steps. First, copy the content to the S3 bucket:

# Be careful, this can delete files too - keep a backup.
aws s3 sync ./local_computer/path s3://BUCKET_NAME/content

Then instruct the Gemini server to update its content from the bucket. The $INSTANCE_ID value is printed out during CDK deploy, or you can see the instance ID in the AWS console.

aws ssm send-command \
  --region=eu-west-1 \
  --instance-ids $INSTANCE_ID \
  --document-name "AWS-RunShellScript" \
  --comment "Synchronise content" \
  --parameters commands="aws s3 sync s3://BUCKET_NAME/content /srv/gemini/" \
  --output text

If you need to debug things, you can get shell access onto the box with:

aws ssm start-session --target ${INSTANCE_ID} --region=eu-west-1

Summary

All in all, launching my first “production” Graviton instance was fairly painless. I was suprised by a few things not being ready yet (I tried out using docker-compose and a few other Python things that just didn’t work, mostly due to crypto bindings [13]).

You can clone the repo and make your own Capsule with CDK, get the code at [14].