The Readers of Skywalker

tl;dr I try to get a handle on Clojure’s custom data readers.

Clojure has described its syntax as a superset of Extensible Data Notation (EDN). I want to take a slightly closer look at the extensible part of that equation.

One of the interesting things EDN allows is tagged literals, for providing some extra semantic meaning on top of the bare literals supported directly. You may have seen this in Clojure with stuff like #uuid "617fd7b8-8551-4c4e-9fde-acb9260f19ae" or #inst "2019-12-20T23:53:00.785-00:00"; those are tagged literals representing a UUID and a Java Date (or instant in Clojure parlance) respectively. It’s an interesting idea: an EDN consumer can simply ignore tagged literals it doesn’t recognize and wouldn’t know what to do with.

Let’s say I’ve got the following code:

src/sw/core.clj

(ns sw.core)

(def episodes
  {; others elided for, um, brevity
   :iv   {:title "A New Hope"}
   :v    {:title "The Empire Strikes Back"}
   :vi   {:title "Return of the Jedi"}  
   :vii  {:title "The Force Awakens"}
   :viii {:title "The Last Jedi"}
   :ix   {:title "The Rise of Skywalker"}})

In this hypothetical domain, maybe I’d like to some shorthand for referring to individual episodes. That is, I’d like to be able to write (:title #sw/ep :ix)1, and have it return "The Rise of Skywalker".

If I fire up a REPL and submit #sw/ep :ix for evaluation, Clojure will whine that there is “No reader function for tag sw/ep”. If I try again with read-string, I don’t fare much better: (read-string "#sw/ep :ix") gives me the exact same error. Fortunately, now I can start plugging into Clojure’s mechanisms for dealing with this behavior. There’s a dynamic var called *data-readers* that I can re-bind to handle missing tags. It looks like this:

(require ’sw.core)
(binding [*data-readers* {'sw/ep sw.core/episodes}]
  (:title (read-string "#sw/ep :ix")))
"The Rise of Skywalker"

That’s progress! But I don’t want to have to re-bind *data-readers* every time I access it. Calling (set! *data-readers* {'sw/ep sw.core/episodes}) will make it stick. After that, calling (read-string "#sw/ep :ix") works, and so does a straight-up #sw/ep :ix!

There’s yet another trick to make this still easier. Clojure will look for data_readers.clj files in the root of the classpath and make those available. (I made reference to it in my post on tools.deps injections.) We just take the map we were setting *data-readers* to, and save it in a file.

src/data_readers.clj

{'sw/ep sw.core/episodes}

Now when I kick up a REPL, I don’t have to muck about with *data-readers*—though I do have to require the namespace with the handler functions. (Unless I set up an injection, of course! 😉)

Interestingly, the data_readers.clj stuff doesn’t work if I’m using the clojure.edn namespace. Once I’m trying to read #sw/ep off the wire, I’m back where I started. The solution is similar. Once again I provide a map of custom readers, but I don’t have to re-bind a var. It looks like this:

(require '[clojure.edn :as edn])
(edn/read-string {:readers {'sw/ep sw.core/episodes}} "#sw/ep :ix")
{:title "The Rise of Skywalker"}

Part of the beauty with EDN’s tagged literals is that it’s possible to ignore stuff I don’t know how to handle. But it’s also nice, when that data continues on to somebody else, to pass the data it along in case they know how to deal with it. Everything I’ve shown so far errors out in the absence of reader config. Alex Miller writes about using core’s tagged-literal function as the default reader function, which is a good way to play nice.

In read-string-land, it looks like this:

(set! *default-data-reader-fn* tagged-literal)
(let [raw "{:episode #sw/ep :ix, :director #movie/director :jj}"
      parsed (read-string raw)]
  (println (get-in parsed [:episode :title]))
  (pr-str parsed))
; => The Rise of Skywalker
"{:episode {:title \"The Rise of Skywalker\"}, :director #movie/directory :jj}"

And in clojure.edn/read-string (significant change emboldened):

(set! *default-data-reader-fn* tagged-literal)
(let [raw "{:episode #sw/ep :ix, :director #movie/director :jj}"
      parsed (edn/read-string {:readers '{sw/ep sw.core/episodes} :default tagged-literal} raw)]
  (println (get-in parsed [:episode :title]))
  (pr-str parsed))
; => "The Rise of Skywalker"
"{:episode {:title \"The Rise of Skywalker\"}, :director #movie/directory :jj}"

Pretty interesting, though I’m not sure what I’ll do with this yet. I had fun getting to see Skywalker a bunch, though. 😀 May the Force be with you!

Questions? Comments? Contact me!

  1. The sw/ prefix is a namespace; EDN reserves un-namespaced tags for its own use. Return

Tools Used

Clojure CLI
1.10.1.492