CloudFront Invalidations with Babashka

tl;dr bb is an excellent addition to a Clojurist’s toolbox.

In an earlier post, I described a git post-merge for syncing changes to S3. One of its limitations was that it didn’t invalidate the associated CloudFront cache. I wanted to rectify that since for one thing, it’s nice to show a new entry in the entry listing after publishing. 😀

Here’s the revised hook:

#!/bin/bash
set -e
branch=$(git branch | grep '*' | sed -e 's/* //')
if [ "$branch" = master ]; then
  commit="${COMMIT:-HEAD}"
  changed=$(git show --name-only --format= "$commit" | grep -e '^www' || :) ❶
  if [ -n "$changed" ]; then
    echo Deploying…
    aws s3 sync --profile AWS profile "$(pwd)/www" s3://S3 bucket/ --delete
    invalidations=$(git show --name-status --format= "$commit" | \ ❷
                    bb -io '(->> *in* ❸
                                 (map #(re-matches #"^M\s+www(/.*)$" %))
                                 (keep second)
                                 (mapcat (juxt identity
                                               #(str/replace % #"/index.html$" "/")))
                                 distinct)')
    if [ -n "$invalidations" ]; then
      aws cloudfront create-invalidation --profile AWS profile \
                                         --distribution-id CloudFront distribution ID \
                                         --paths $invalidations
    fi
    echo Done.
  fi
fi

(I apologize for any horizontal scrolling you may encounter tracking down the number references. Still trying to figure out the best ways to present this stuff!) Here’s ❶:

git show --name-only --format= "$commit" | grep -e '^www' || :

That produces the changed files for a commit, but restricted to those in the www directory where all the content lives. That funky || : bit at the end is a bash-ism that represents a no-op. When grep finds no results, it exits with an error, and || : swallows it. In this case, exiting here with an error would be fine, since it would exhibit the same behavior; but it’s sloppy practice. For the commit introducing this post, the output looks like this:

www/blog/2019/12/06/cloudfront-invalidations-with-babashka/css/tachyons.min.css
www/blog/2019/12/06/cloudfront-invalidations-with-babashka/index.html
www/blog/index.html

❷ is very similar. It comes right after the S3 sync, but before invalidating CloudFront’s cache. I want a different set of files: only the modifications. If they’re new, there’s nothing to invalidate.

git show --name-status --format= "$commit"

The only change is the --name-status option, which provides a single-letter code indicating what happened in the commit: A for additions, D for deletions, and, most important for my uses here, M for modifications. Here’s the output for the same commit as before:

A	www/blog/2019/12/06/cloudfront-invalidations-with-babashka/css/tachyons.min.css
A	www/blog/2019/12/06/cloudfront-invalidations-with-babashka/index.html
M	www/blog/index.html

So there’s one modification, the entry list. Let’s next look at how I whittle down those results to just that single relevant file. I take the output from the git command, and at ❸ I pipe it into a program called bb. That’s the Babashka executable, which is a li’l Clojure dialect specifically designed to be embedded in bash scripts. It’s compiled to native code using GraalVM, so it starts up lightning fast.

I confess I was a little skeptical of Babashka. If I want the power of Clojure in a script, why not just reach for something like Joker, the Go-based Clojure dialect? Indeed, that was my first thought.1 But when I started porting over the script, it wasn’t a clear win to me: As one example, it was a relatively large amount of plumbing to get the output from running git commands. That was when the light bulb went on. Wouldn’t it be great if I could pipe git show right into Clojure? Let me break down the Clojure snippet:

(->> *in*
     (map #(re-matches #"^M\s+www(/.*)$" )
     (keep second)
     (mapcat (juxt identity
                   #(str/replace % #"/index.html$" "/")))
     distinct)

With the -i option, Babashka binds a stream of input from stdin to the *in* variable, letting you work with it like any other Clojure sequence (like I’m doing here with the thread-last macro). I map each string from git show, and do a regular-expression match. I’m looking for two things:

Here’s what the output looks like if I run just that bit of the pipeline so far:2

(nil nil ["M\twww/blog/index.html" "/blog/index.html"])

Because the first two files are additions, they don’t match at all and return nil. The last file, the only modification, is a vector of re-matches’s usual results: the entire matching string as the first element, followed by any capture groups. I’m only interested in the capture group, and (keep second) helps me get it. It sort of acts like map, except it also throws away any nil results.

In early versions of this script, I ran into an issue where even though I was creating the invalidations with all the paths I thought I needed, the pages remained unchanged even after the invalidations completed. What gave? Even though the S3 bucket is configured to return a folder’s index.html by default, CloudFront doesn’t care about that: it only cares about the literal path strings. For every /index.html file I was expiring, I needed to clear out the corresponding / path as well.3

(juxt identity #(str/replace % #"/index.html$" "/")) returns a vector containing the unmodified path and the path with the index.html removed, then mapcat turns it back into a flat list of paths. I conclude with distinct because, if the path doesn’t end with /index.html, I’ll end up with two of the same thing. (For example, /example.html will return ["/example.html" "/example.html"].)

Now that I’ve got the paths I need to output them. By default Babashka outputs Clojure literals. For example, strings will be wrapped in quotes. But if you give it the -o option (as I do here), it will output the "raw" results one per line, without quotes. Generally, this is so bb can easily participate in a bash pipe chain. In this case, I just wanted the no-quotes behavior, since I’m just capturing the results for later use. Here’s what I get without the -o option:

("/blog/index.html" "/blog/")

Versus with the -o option:

/blog/index.html
/blog/

I want to make a quick note about an aspect of Babashka’s developer ergonomics. Note that I was able to use clojure.string/replace without needing to explicitly alias the namepace; it was pre-aliased to str, as you often see in idiomatic Clojure. Given the context in which Babashka is intended to be used, this is a delightful way to improve using it.

It was fun finding a place to put Babashka to use. I think it’ll be one of those tools where, now that I’ve found one place to use it, more and more opportunities will present themselves. How our tools shape our very thoughts!

  1. And I hope to have chance to talk more about Joker someday! Return
  2. And without the -o option for readability. Return
  3. (It’s entirely possible I need to do this for no-trailing-slash paths too, but I haven’t run into this yet.) Return

Tools Used

aws-cli
aws-cli/1.16.290 Python/3.7.5 Darwin/19.0.0 botocore/1.13.26
Babashka
0.0.37
bash
3.2.57(1)-release (x86_64-apple-darwin19)
git
2.24.0