Haskell on Actions, Part 3

Notice: This website has moved to a new URL. Please visit us at Renaissance Learning R&D.

Haskell on Actions, Part 3

by @pbrisbin on May 21, 2021

This is the last post in a series about our Haskell projects on GitHub Actions.

In this post, we’ll talk about Docker-based deployment of a Haskell project. Really, this applies to any compiled project, where careful layers, multi-stage builds, and layer caching on CI is important.

Layer management & multi-stage builds

If you already understand Docker layers and multi-stage builds, you can safely skip this section.

Docker builds work in Layers. Each step in a Dockerfile establishes a layer, which is effectively a snapshot of the file-system at that point. This has two consequences worth discussing here:

If inputs to a layer (the layers before, any files being added) have not changed from a previous build, and the artifacts from that previous build are still present, it will not be built again
If you add large files in one layer, and remove them in another, they still physically exist in the original layer and resulting image.

What does this mean for your Haskell Dockerfiles?

First, you should be sure that slow layers (such as installing the compiler and dependencies) are only “busted” when they need to be. You don’t want a change to your main App.hs to cause re-installing GHC. Concretely, this means you should COPY in only the files that impact dependency installation, then only install dependencies, as distinct layers.

RUN mkdir -p /src
WORKDIR /src

# As long as these files don't change
COPY stack.yaml package.yaml /src/

# This step won't re-run
RUN stack --no-terminal build --dependencies-only

# If these files change
COPY library /src/library
COPY executables /src/executables

# Only this (faster) step will re-run
RUN stack --no-terminal build \
  --pedantic \
  --ghc-options '-j4 +RTS -A64m -n2m -RTS' \
  --copy-bins

This could be made even more granular. Only stack.yaml informs GHC choice, so you could COPY that and install GHC separately, before proceeding to package.yaml and dependencies installation. However, there are additional complexities with that, such as extra-deps and stack refusing to do anything without a package.yaml. These complexities are solvable, but put this idea on the far side of diminishing returns for me.

Second, you’ll want to make use of multi-stage builds. In the “old days”, we would do this by building one image with a compiler tool-chain, then running another docker build in that image to produce executables for use in a slimmer image built from an entirely different Dockerfile. What a headache.

To solve this, recent Docker allows FROM commands to be named, and for there to be multiple of them. Each FROM wipes any layers that have come before it (and the final FROM establishes the resulting image), and you can use COPY to grab files from any previous FROM’s layers.

# Stage 1
FROM fpco/stack-build-small:lts-17.8 AS builder
# ...
RUN stack install

# Stage 2
FROM ubuntu:18.04
COPY --from=builder /root/.local/bin/my-exe /my-exe
CMD ["/my-exe"]

This can interact poorly with caching on CI: If you pull the most recently deployed image before building a new one, in an attempt to re-use cached layers, you’ll find none of the builder stage’s layers are cached. This makes sense in retrospect because they don’t exist in the final image, by design. We’ll solve for this when we talk about caching in our example Workflow.

Example

Taking all of the above into account, here’s a mildly abridged example of a typical Haskell Dockerfile:

FROM fpco/stack-build-small:lts-17.8 AS builder

# ...

RUN mkdir -p /src
WORKDIR /src

COPY stack.yaml package.yaml /src/
RUN stack --no-terminal build --dependencies-only

COPY library /src/library
COPY executables /src/executables
RUN stack --no-terminal build \
  --pedantic \
  --ghc-options '-j4 +RTS -A64m -n2m -RTS' \
  --copy-bins

FROM ubuntu:18.04

# ...

COPY --from=builder /root/.local/bin/my-exe /my-exe
CMD ["/my-exe","+RTS","-N"]

You can go much further in the slim image game. Using something like Alpine as the runtime base is common, but can cause problems with missing shared libraries. More aggressive executable stripping is also common. Again, the complexities that brings are not worth the size savings for me. We find images using this approach typically weigh in around 50MB and we’ve had zero issues working with images of that size.

Docker Layer Caching on GitHub Actions

There are a few ways to attempt layer caching on GitHub Actions, but I’ve only found one that works: this one. It uses Buildx, which I’m not familiar with, but the key part is the cache- options, particularly mode=max, which ensures all the layers from a multi-stage build are included.

Here is a full ci.yml using it:

name: CI

on: push

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: freckle/stack-cache-action@main
      - uses: freckle/stack-action@main

  image:
    runs-on: ubuntu-latest
    steps:
      # For example, say you push to Dockerhub under the same org/name as this
      # repository itself
      - id: prep
        run: |
          tags=${{ github.repository }}:${{ github.sha }}
          echo "::set-output name=tags::$tags"

      - id: buildx
        uses: docker/setup-buildx-action@v1

      - uses: actions/cache@v2
        with:
          path: /tmp/.buildx-cache
          key: ${{ runner.os }}-image-${{ github.sha }}
          restore-keys: |
            ${{ runner.os }}-image-

      - uses: docker/login-action@v1
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_PASSWORD }}

      - uses: docker/build-push-action@v2
        with:
          builder: ${{ steps.buildx.outputs.name }}
          cache-from: type=local,src=/tmp/.buildx-cache
          cache-to: type=local,mode=max,dest=/tmp/.buildx-cache-new
          push: true
          tags: ${{ steps.prep.outputs.tags }}

      # Avoids ever-growing cache hitting limits
      - run: |
          rm -rf /tmp/.buildx-cache
          mv /tmp/.buildx-cache-new /tmp/.buildx-cache

    outputs:
      tag: ${{ steps.prep.outputs.tags }}

  deploy:
    if: ${{ github.ref == 'refs/heads/main' }}
    needs: [test, image]

    # Most likely some AWS action to update an ECS task to:
    #   ${{ needs.image.outputs.tag }}

This example uses DockerHub, but only the login step and prep.outputs.tags need to change if you use another registry, such as AWS ECR.

Bonus!

I hope you enjoyed this series on Haskell and GitHub Actions. As a parting gift, here are two other Actions-related projects we maintain and use:

stack-bump-lts-action: find the latest LTS, update your stack.yaml if it differs, and commit with a message including details about the changed dependencies.
hackage-team: maintain the Maintainers for your team’s Hackage libraries to match a centralized list, automatically.