adrianhesketh.com

Verifying download hashes during Docker build

On a system I was working on, I got a security finding back with the comment:

Insecure Software Install Mechanism in Container Build

This was down to installing binary releases of software using curl inside a Dockerfile which was used to build a Docker image:

# Install Go.
RUN curl -L -o go1.17.2.linux-amd64.tar.gz https://golang.org/dl/go1.17.2.linux-amd64.tar.gz 
RUN rm -rf /usr/local/go && tar -C /usr/local -xzf go1.17.2.linux-amd64.tar.gz
ENV PATH "$PATH:/usr/local/go/bin"
ENV PATH "$PATH:/root/go/bin"

The risk is that if the download is poisoned in some way (i.e. hacked and the download turned into an attack program), I wouldn’t know about it.

The remediation listed in the review was to check the cryptographic signatures on downloaded content to verify that the download matched the expected value.

Verifying hashes

Linux systems usually include the shasum256 program for this which can calculate and verify the hashes of downloaded files.

MacOS ships with shasum which can be configured to use SHA256 with an extra parameter (-a 256).

Rather than verify each file one at a time, I decided to use a multi-stage Docker build [0] to split the Dockerfile into two sections.

To validate the hashes, I need a set of known good values, so I downloaded all the files and created a verification file on my Mac using shasum. This means that when a build server, or someone else downloads the file, they can be sure they’ve got the same file as I did.

shasum -a 256 *.* > past_hashes.txt

Some projects list their expected hashes, rather than relying on you downloading the file first. For example, Go provides a SHA256 hash value alongside every download link.

To validate the contents of the files in a directory against a file containing the previous “known good” hashes, the sha256sum -c parameter can be used.

sha256sum -c past_hashes.txt

Docker download section

The first section does all the downloads, and verifies them, while the second one uses all of the downloads.

Using a multi-stage Docker build keeps the download and verification code separate from the installation process.

I used alpine as a base for this section, but I could probably have used the same node:16 image I was using in the next section too.

FROM alpine:latest AS downloads

# Install curl.
RUN apk add curl

# Download all the files.
# https://explainshell.com/explain?cmd=curl+-fsSLO+example.org
WORKDIR /downloads
RUN curl -fsSLO https://download.docker.com/linux/debian/gpg
RUN curl -fsSLO https://awscli.amazonaws.com/awscli-exe-linux-x86_64-2.4.14.zip
RUN curl -fsSLO https://go.dev/dl/go1.17.5.linux-amd64.tar.gz

# Create a sum of all the files we just downloaded.
RUN find . -type f -exec sha256sum {} \; >> /downloads/current_hashes.txt
RUN cat /downloads/current_hashes.txt

# Compare to the file of past hashes we already created and fail the build if they don't match.
COPY past_hashes.txt /downloads
RUN sha256sum -c past_hashes.txt

Using the downloads

The next (and final) section of the Dockerfile copies the downloads from the first section using the Dockerfile COPY operation with the --from=downloads parameter to copy the files from the downloads section of the Dockerfile.

There’s no usage of curl in the second section of the Dockerfile at all.

Since the files are downloaded into the /downloads directory, they can be used from that location.

FROM node:16 
# Based on Debian buster.

COPY --from=downloads /downloads /downloads

# Install Go.
RUN rm -rf /usr/local/go && tar -C /usr/local -xzf /downloads/go1.17.5.linux-amd64.tar.gz

Full example

I’ve put together a full example at [1] that installs a number of tools including AWS CDK, Go and Typescript onto a base image.

The example also uses the go install command to install specific commits of command line tools that are written in Go to ensure that a specific version is used. This prevents a git tag from being re-pointed at different code.

https://github.com/a-h/aws-go-cdk-action/blob/main/Dockerfile [1]

Results

These few changes resolved the security finding without much of a headache, and the downloads were cached between multiple builds which saved time.

I’ll probably use this style from the start of any new project.