You work for me, Computer.

By Brandon Bloom

Rarely Reversible

React.js and its “IMGUI” inspired rendering model is exploding in popularity. After my talk at Clojure/West, several folks asked me about two seemingly separate discussion topics: Two-way Bindings and Cursors.

Among those asking me about data binding, there are a few campaigning for a “spreadsheet-like” method of defining user interfaces. On one hand, they’re right to seek a direct-manipulation experience for building artifacts in a fundamentally visual and interactive medium. However, the code-level approach of using spreadsheet-like “equations” is fundamentally flawed.

Meanwhile, folks using Om in practice have expressed their frustrations with cursors. A few suggested using something more principled like Haskell’s lenses. While this may be useful in some situations, it’s fundamentally flawed for the same underlying reasons.

Both designs share a flaw born of a common desire: To automatically map user input back to data sources. When there’s a 1-to-1 mapping from data sources to user interfaces, this is appropriate. However, it’s not sufficient for the general case. In fact, it’s not sufficient for the common case.

Transformations of source data in to views beyond trivial editable fields is almost never reversible or equational.

To substantiate my position, let me demonstrate just how difficult it is to write reversible relationships with some examples in the domain of grammars.

Let’s say you have this simple grammar:

A ::= a+

The production A is formed of 1 or more a characters.

An infinite number of substitutions will satisfy the rule.

a -> A
aaaaaa -> A

Here, the direction of the arrow is critically important.

How can you reverse this production rule?

??? <- A

The only universal way to convert directed rules in to reversible relations is to never discard “information”.

For example, we could parameterize A by a number.

aaaa <-> A(4)

Alternatively, you can specify a general principal for lossy reversal rules, like “always produce the smallest value that would satisfy the rule”.

a <- A

However, this falls flat on its face if you introduce alternation:

AB ::= a | b

a -> AB
b -> AB

??? <- AB

Neither a nor b is “smaller”, so you need a new principle to reverse by. In a traditional context free grammar, | doesn’t imply order, but in a PEG, for example, you have ordered choice. You could say that you reverse to the first choice which satisfies the rule. Let’s use / as the ordered choice operator.

AB ::= a / b

a -> AB
b -> AB

a <- AB

Even in this simplified context, we haven’t even begun to scratch the surface of potential problems with reversibility. Constructs such as binders or loops will cause the reversal principles to explode in complexity.

Grammars offer pretty simple and well defined operations. However, practically every new operation introduces new reversal principles. Once real world business logic gets involved, things get hairy quickly.

In the context of user interfaces, these problems are magnified to be so large that they are practically unrecognizable. However, if you zoom in on individual cases in your own applications, you’ll spot this inherit complexity.

A robust UI solution should not attempt to build upon a foundation of automatic reversibility.

Unsound and Incomplete

The academic programming language community spends lots of time studying and building type systems. It’s widely considered that soundness is the mark of a well designed type system. A sound type system never permits an incorrectly typed program to pass type checking. An ideal type system would also be complete; it would never reject a correctly typed program because it was unable to prove its correctness. If you’re not familiar with this terminology, Ben Karel offers an illuminating analogy to help you.

Recently, a flurry of new, statically typed languages have entered the public stage. Among these are Google’s Dart, as well as Microsoft’s TypeScript. One thing that these languages have in common is that they have unsound type systems. Cue the horrified gasps of the type theorists!

Ignoring the topic of completeness for a moment, since most critics do so themselves (I think “unsound” just sounds scarier than “incomplete”). Surely Google and Microsoft employ some smart folks who have followed the latest research in type systems, so why, in 2014, are we still building languages that have unsound type systems? Well, I’ll let the Googlers and Microsofties explain themselves. In short, it’s difficult to maintain soundness in feature-rich languages. Even when possible, it may cause an astronomical increase in complexity. Achieving soundness requires a trade off against completeness, complexity, toolability, and more. Never mind the fact that being fully sound isn’t even that useful to begin with!

Let’s step back and examine why we even bother with type systems at all. In many cases, static analysis can dramatically reduce programmer errors and opens up significant optimization opportunities. Programming languages that are amenable to formal static analysis are highly correlated with programming languages that are amenable to informal static analysis, i.e. by programmers! Type systems have proven to be a particularly useful category of static analysis; so much so, that many programmers can’t imagine building software without a type checker on hand at all times.

Sadly, type systems fundamentally can’t catch all kinds of programming mistakes, nor will they magically improve the time or space complexity of your algorithms. You’ll have to write tests and benchmarks either way, to say nothing of validating your designs with stakeholders! Still, there are other reasons to have a good type system, like enforcement of type safety (which is a different thing than “type correctness”). However, safety can also be enforced by dynamic means and dynamic mechanisms such as capabilities can enforce security far beyond that of static safety checks. You might even want to intentionally forego safety in order to achieve maximum performance. Furthermore, there’s lots of other kinds of static analysis, both formal and informal, that can be used to find bugs or speed your programs up.

Assuming we want a tool that finds as many errors as possible, we should be willing to forego completeness, instead preferring to have our static analysis be overly cautious, warning us of programs that might be incorrect. We can always suppress such warnings after informal analysis (or alternative formal analysis!). Assuming we’re realistic about the fact that no static analysis tool will ever catch all errors, we should also be willing to tolerate unsoundness in order to spend our mental complexity budgets more wisely. Assuming we want a tool that independently improves with time, we should not create a cyclic dependency between languages and type systems.

There is a class of such tools. They’re called linters.

Slurp & Spit

Does nobody teach fopen anymore?

Many programmers start with “Hello World”. Shortly after that, it’s “Hello $NAME”, with a string read off stdin. But soon, budding engineers get tired of reminding their computer of their name each and every time their code runs. When I was first learning to program, file IO was an early requirement. We were taught how to read and write “data files” using little more than fopen, fscanf, fprintf, and fclose.

Fast forward five to ten years; you find yourself writing your 10 millionth SQL query, wishing for simpler times. Spend some more time with the NoSQL database du jour and the humble fopen function will be but a distant memory. Until that one fateful day arrives where you’ve got a relatively simple program and encounter the need for equally simple durability. Five years ago, you’d have cracked your knuckles and hacked out a pair of “save” and “load” functions. Today, you add a dependency on your favorite database driver, switch to the shell, type createdb myapp and then dutifully begin defining a lovely schema. Of course, now you need to either rework your models to conform to some horrid ORM layer, or write save/load-style “hydrate” and “dehydrate” methods anyway.

Now, at ten years and a day, you decide it’s finally time to learn that hip new programming language that everybody is talking about. You’ve got your book out, you’ve rocked through all your favorite technical interview puzzles, and you’re ready to put together a little web service for personal or small group use. If this was yesterday, you’d know exactly what dependency to add and how to proceed, but that was yesterday. Today, you do some Googling or hop into IRC looking to find out what’s popular.

Why don’t you even consider fopen?

But files aren’t “Web Scale”!

Is that really true? And do you really care? Should you really care?

The answer to all of these questions is “No”. Files can easily be “web scale”. As of 2013, Hacker News is still running as a single process, on a single core, of a single server, backed by a directory structure of simple data files. Nearly 2 million page views are served daily. See this thread and this submission for details.

The bottom line on this subject is that 1) You’re going to need significant cleverness to scale any service regardless of whether you use a database or the file system directly. And 2) You probably won’t need to scale hugely anytime soon, anyway. Being crushed by traffic is a Good Problem To Have, so let’s worry about the Ghost Town problem first. 3) You can always refactor later.

Data Literals: print, read, spit, and slurp

This article was prompted by a reoccurring discussion started by newcomers to the Clojure IRC channel. Having learned the basics, folks want to hit the ground running with stateful and durable services. Coming from a background in some big framework or some hairy application, these folks ask those suggestive questions about databases, driver libraries, and the like. Often, upon interrogation, they only have a couple kilobytes or megabytes of total data to be stored. In such a situation, the advice is always the same: Put that data into an atom, then use spit and slurp to save and load.

(def db (atom {...}))

(defn save-data []
  (spit "somefile" (prn-str @db)))

(defn load-data []
  (reset! db (read-string (slurp "somefile"))))

Because Clojure encourages the use of printable, readable, “pure” data, your save and load functions are virtually free! If you’re not familiar with Clojure, then consider working with pure JSON in Node.js:

var db = {...};

function saveData() {
  fs.writeFileSync("somefile", JSON.serialize(db));

function loadData() {
  db = JSON.parse(fs.readFileSync("somefile"));

Things aren’t quite as easy in languages lacking data literals, but nearly every popular language has some kind of automatic serialization library. However, even if you do need to write your own save and load functions, it’s a pretty straightforward, if somewhat tedious, process.

Even—or especially—experienced programmers are surprised by just how far this brain-dead-simple durability scheme will go.


One other objection to files is that it’s difficult to ensure an application enjoys the same guarantees that a well-tested ACID database affords. Never mind the fact that the majority of Rails applications suffer from dozens of consistency bugs because most developers forget to wrap related updates in transactions; it is true that incorrectly implemented file IO can cause catastrophic data loss.

If you plan to deploy spit and slurp into production, you’d be advised to write to a temporary file and utilize an atomic rename. This ensures that a failure during writing won’t corrupt your database. See man rename for more.

function saveData() {
  fs.writeFileSync("somefile.tmp", JSON.serialize(db));
  fs.renameSync("somefile.tmp", "somefile");

Just this one little tweak and this bad boy is production ready. Clojure programmers can use Java’s File.renameTo method. See below.

Remember to configure your backups!

Don’t Stop The World

Experienced Node.js programmers likely cringed at the “Sync” suffix on the above file operations. These operations will block while data is being read or written. In addition to yielding clearer example code, synchronous operations do not require coordination. If your application only needs to serve a handful of users, you can still write 100 megabytes of data in less than half a second. Even if you write every change to a file on every request, your ten users might not even notice your blocking IO. As you scale up, you’ll need to either split your data files up and/or start writing files asynchronously.

Asynchronous means coordination. Coordination means locking or queues. This stuff really isn’t as scary as it sounds, but it will have to wait; this post has already gotten far too long. However, I don’t want to lead you down the wrong path, so I should mention that, unlike Node.js, synchronous writes on the JVM will not block other requests. You practically don’t have a choice, but to be asynchronous. If two requests both call your save function at the same time, the resulting race condition can lead to incremental data loss. Luckily, Clojure’s agents and persistent data structures provide for a super quick fix:

(import '

(def save-agent (agent nil))

(defn save-data []
  (send-off save-agent
    (fn [_]
      (spit "somefile.tmp" (prn-str @db))
      (.renameTo (File. "somefile.tmp") (File. "somefile")))))

Different web servers across all the different languages have varying concurrency support and configuration defaults. Whether or not you use file IO, you should be aware of the concurrency story for your platform and its impact on your application.

Code Density

I’ve heard from many folks who have had difficulty learning functional programming. In attempting to complete seemingly simple tasks, even the most experienced imperative and object oriented programmers fall flat on their faces when first attempting to program in a functional style. Yet, experienced functional programmers rave about their productivity and their small, beautiful codebases. Are these functional programmers simply that much smarter? Or is there some other force at work?

Even in a familiar language, a small, but powerful library might be described as “write-only code” by those who do not grok the fundamental ideas. Powerful ideas produce powerful programs, but powerful programs don’t always shed light on their underlying powerful ideas. Functional programming is a powerful idea.

A lot of businesses just need somebody to bang out some pretty vanilla solutions to some pretty vanilla problems. If you’re in one of these businesses (and most of us are), then your team’s resident curmudgeon is right: Clever is not a good thing. Familiar and understood beats simple and powerful ideas every time.

However, if you’re looking to up your game; If you want to tackle bigger problems, you need more powerful tools. If you’ve tried to read some Lisp or ML or Haskell, but found your head swimming; If you’ve tried to fight your way through Project Euler with an unfamiliar language, but got stumped by tasks you aced in freshman year; Don’t panic. There’s a good explanation for why these things seem harder. Those functional programmers really aren’t smarter than you. They’re just a little more persistent.

The key issue is one of code density. Powerful ideas can succinctly express solutions to tough problems. The problem isn’t necessarily any easier, nor is expressing the solution necessarily any more difficult. There is simply more meaning per line of code, so reading and writing each line of code takes proportionally longer. If you’ve jumped into a 100,000+ lines-of-code Java project and found your way around like a champ, you shouldn’t feel ashamed by jumping into a 10,000 line Clojure project and being horribly lost. Those 10,000 lines of code might take you five times as long to read per line. These numbers are only backed by anecdotal evidence, but that’s a hypothetical 50% reduction in total code reading time!

When you start writing some object-oriented code, you spend a bunch of time creating classes and defining the shape of your domain. There is some delay between when you begin typing and when you run head first into the problem at hand. During that delay, you have a lot of time to think about the problem. While you’re fingers are hard at work, your mind is too. Your brain needs time for solutions to bake. If you encounter the hard problems as soon as you start typing, you need to walk away from your desk!

Functional programming is a particularly compelling powerful idea because it captures the essence of a lot of computational problems. There is a great deal of complexity that is hidden away in mutable state, implicit context, and abstract interfaces. If you’re used to the constant drip of progress afforded by stitching together class hierarchies, you suffer immediate withdrawals when faced with all of the hidden moving parts laid bare in an individual function’s signature. Suddenly, you need to solve many more problems at once, but you’ve become accustom to having those problems spread out across a wider cross-section of your program.

It takes longer to get started with a powerful idea because you must first understand and internalize that powerful idea. Take a deep breath, lean back in your chair, close your eyes, and think through the problem. Untangle the mess in your head, maybe explore a bit in the REPL, then come back and try the problem again. You might find that you spend half as much time thinking as you used to spend typing.

SemVer: A Technical Solution to a Social Problem

“Semantic Versioning”, commonly called SemVer, is a specification for assigning and reasoning about software version numbers. In short, it codifies some common practices regarding $MAJOR.$MINOR.$REVISION style version numbers, including guarentees about API compatibility and dependency resolution.

Beautifully simple semantics, outlined by a clear and concise specification. What’s not to like? Technically, SemVer is very strong. However, statistically speaking, software developers can not be trusted to maintain the semantics promised by SemVer. A small handful of individuals and projects use SemVer, or something like it, to good effect. The rest can’t be trusted not to introduce major bugs or breaking changes with each and every revision.

Every engineer who has ever worked on a project with more than a handful of external dependencies has had the horrifying experience of deploying some small dependency upgrades only to have their entire house of cards come crashing down. As a result, mature organizations tend to maintain their own package repositories, vendor all dependencies, or otherwise carefully scrutinize and control each and every external component.

Packaging systems such as Ruby’s Bundler and Node’s NPM often default to or otherwise encourage SemVer. Depending on a pessimistic version constraint, such as ~> 1.2, is tantamount to saying “I trust that the maintainer of this project both understands SemVer and is capable of enforcing its guarantees.” Sadly, this is rarely, if ever, true.

Versioning is a social contract. A maintainer makes a promise regarding API stability and versioning policy. Consumers make a judgement call regarding the veracity of the maintainer’s promise. If the README file says “This project uses SemVer” and the project’s maintainer is Tom Preston-Werner, then you can trust that pessimistic version constraint in your dependencies file.

However, the README might say “Releases are only packaged for long-term support versions. Please use the stable branch for new production systems.” In such a case, you might want to consider depending directly on that branch or even on a particular commit hash. Speaking of branches and commits, this calls to mind a far preferable default for packaging and dependency resolution systems: refspecs. It would be great if the SemVer spec was updated to include policies for defining branch and tag names in your version control system of choice. Instead of depending on ~> 1.2, you could depend on branches/v1.2.x. This grants maintainers greater flexibility in making social contracts. It also grants consumers greater control over how much trust they give to maintainers.

Trivial Cycles in Object Oriented Programming

I’m not against objects. I’m against programming in a style oriented by objects.

One of the largest causes of big-ball-of-mud architectures is unnecessary cyclic dependencies. Unfortunately, object-oriented languages allow programmers to trivially create cyclic dependencies without warning or immediate repercussions. Overtime, these cycles become a significant source of complexity and corresponding suffering.

The problem can be summarized in a single line of pseudocode:


This innocuous-looking expression embodies a pleasant interface, but its implementation hides an insidious cyclic relationship between the Person class and the Address class. Worse than that, this cyclic relationship is extended transitively to any other related classes. Consider the following diagram with an additional Company class that may also have an address.

Person / Address / Company ball-of-mud

Any time you add a type to this graph, it is instantly tangled-up with everything else. Compare this with this functional variant:

Person / Address / Company untangled

Now you can analyze, compile, and reason about a subset of this graph. Reasoning about any particular node only requires reasoning about the transitive closure of its dependencies. However, in the presence of ubiquitous cycles, the transitive closure is equivalent to the entire graph.

Unfortunately, we’ve now lost the convenient noun.verb or noun.component syntax. Instead, our expressions are inside out. Here it is in Ruby or Python syntax:


And here it is in Lisp syntax:

(residents (address person))

Luckily, we can recover the pleasant ordering with the help of a macro. Here is an example of Clojure’s threading macro:

(-> person address residents)

In the absense of macros, we can use a binary operator. Here’s an example of F#’s “pipeline” operator:

person |> address |> residents

Both Scala and C# provide tools for directly emulating the dotted access syntax. Respectively, implicit conversions and extension methods leverage the type system to perform dispatch to non-members in a syntactically identical way.

It is unfortunate that it is so easy to create cycles. A small sin at first, but they quickly add up and really start to hurt as a system grows larger.


A while ago, I started blogging again. Google’s Blogger isn’t great for code snippets, so I contorted a few posts to fit the formatting. A few posts never got written at all. I wanted to get my blog into Github Flavored Markdown and under version control. I’m not a super huge fan of Jekyll, but in interest of getting back to content, I went ahead and did the simplest possible thing that could work.

So here’s my new blog.

It’s built on Jekyll via Octopress with the a Theme by Lucas Lew and hosted on GitHub Pages.

I ported the majority of the posts from HTML to Markdown via a pile of Vim Macros. Then, I set up a tiny Ruby app on Heroku to redirect old URLs. It was an extremely tedious process, but I’m pretty happy with the result!

Fipp: Fast Idiomatic Pretty-Printer for Clojure

I decided to create a new pretty printer in the spirit of “Data All The Things!” And it’s fast too!

Fipp, the Fast Idiomatic Pretty-Printer for Clojure, is a pretty printer with linear runtime and bounded space requirements. Unlike clojure.pprint’s side-effectual API, Fipp is configured by pretty print documents that are similar to Hiccup HTML templates. It’s based on some fancy pants research pretty printers from the Haskell world.

You can see my implementation and lots more notes at

Factjor: Concatenative Clojure

I’ve released another small Clojure project: Factjor

Head on over to GitHub for a tiny intro in the README. Lots more explanation and context will follow in a future post, after I release the project that motivated Factjor.

The Node.js REPL Is Broken

Not only is the Node.js REPL broken, but so are the stock REPLs for Python and Ruby too! In fact, for many of you, every single REPL you’ve ever used has been fundamentally flawed. Moreover, even if you used a fancy replacement REPL, such as UltraREPL, IPython, or Pry, you’d only be mitigating the problem: JavaScript, Python, and Ruby are all fundamentally flawed languages from a REPL-lover’s perspective.

Now, I realize that this is a bold and, thus far, unsubstantiated claim. Python and Ruby are revered for interactive, dynamic development and Firebug’s JavaScript console changed the game for web apps. So with so many happy REPL users, what are these fundamental flaws? Since it is the worst offender, let me pick on JavaScript for a moment…

Node’s stock REPL provides a fragile and awkward notion of an evaluation context. A fresh REPL is configured with an evaluation context as if by both with(context) and .call(context) to enable unqualified access to context variables, as well as introspection of the this context. This, however, is a convenient illusion. If you evaluate var x = 1; this, you’ll note that x is added to this. If you were to use var inside a with block, you wouldn’t get that same behavior. The top-level inside the REPL is somewhat magical.

CommonJS Modules provides the blueprint for Node’s module system. The key idea is that Plain Old JavaScript Objects can be used to describe modules and their exports. Simple, but perhaps too simple. Consider if you were working on some function f in module m. You boot up your trusty REPL, evaluate var f = require('m').f; and try out some corner cases. Ugh oh, f has a bug, so you modify src/m.js and then you…. promptly restart your REPL! The require function doesn’t offer any sort of {reload: true} option, and even if you did, you’d still be left with a reference f pointing to an old version of the function. That might not be a big problem if you have just one function, but what if you have several objects and functions in a larger system? The too-simple plain old JavaScript objects approach results in the copying of references and assigning those copies to local names. It is extremely difficult to reason about changes. Even if you knew to cheat by reaching inside of require.cache with a delete, you’d still be better off restarting your REPL to preserve your sanity. That’s a real shame if you had a bunch of interesting test cases lying around in memory, or if there is an expensive initialization computation you need to wait through each time.

Now that I’m done picking on JavaScript, I’ll point out that the situations in both Python and Ruby aren’t much better. Python doesn’t have a notion of a current module, although it does provide the builtin locals() function, as well as lets you call code.interact(locals={...}) to start a nested sub-REPL with something akin to Node’s magical top-level. Ruby fairs marginally better by providing a reasonably well behaved implicit self, which can be shadowed in a nested sub-REPL via an argument to the confusingly named irb method. Python’s modules have a similar issue to plain old JavaScript objects with from m import f and Ruby’s metaprogramming facilities add more than enough complexity to make REPL-coding hair-raising.

It doesn’t have to be this way. There are languages that are much better suited to iterative REPL development. If you’ve been following my blog, then this is where you’d expect me to praise Clojure. I’ll save that for a future post in which I’ll discuss what is really necessary for a successful REPL. And while Clojure’s REPL is far better than any other I’ve ever used, it has its own shortcomings to discuss as well.


Florian said…

Note that you’re lamenting for the most part that python, ruby and JS can’t do “hot-reload”.

This has nothing to do with an interactive prompt.

Hot reloading never really does work whatever you do (it doesn’t “just work” in smalltalk and any lisp dialect either). The first reason is that even if you can magically substitute all references to everything with the right version (which smalltalk and most lisp dialects can) it still leaves in-memory data corrupt (you haven’t written an in-memory migration script have you?).

The situation is somewhat worse for any language that allows references to everything (like python, ruby and JS) and is not side-effect free. In those environments hot-reloading becomes all but impossible, and although that’s a bit sad, the very way those languages work also makes them pleasant to use unless you want a hot reload.

Brandon Bloom said…

Being able to “hot-reload” a file is nice, but the failing is in the manner by which names are resolved. Aliasing imports into a local variable lacks a level of indirection; this precludes hot-swapping.

This has nothing to do with an interactive prompt.

These are design decisions that impact the usability of interactive prompts.

you haven’t written an in-memory migration script have you?

Sometimes I do migrate my in memory data structures… I enjoy having that option.

devdazed said…

I agree that the Node REPL is by far the worst offender, I think the largest point should have been that the REPL decides when an error make it crash. For instance, if you run:

throw new Error('foo');

All is well and it prints a stacktrace, however if you run:

process.nextTick(function(){ throw new Error('foo') });

It will cause the entire REPL to crash, and thus you need to start from square one. It’s this type of inconsistent behavior that makes the Node REPL infuriating.

IMO, the REPL should never crash and if there is anything a user can do to make it crash (aside from sending a kill -9 command) then it is fundamentally flawed.

Another thing is that you can cause a JS OOM error by simply creating a tight look that prints to the console.

Additionally, standard sigterm signals (such as ctrl+c) won’t break such a tight loop, so your only choice is kill the entire application.