Good API Design: Part 4
Monday, September 28, 2009 at 01:02PM In Part 3 of this series, I outlined how API designers can use the type system to make APIs hard to misuse. In this fourth installment, I'll talk about ways to make APIs comprehensible and digestible, while still retaining power.
To motivate this discussion, I'll draw your attention to a real-life library whose design made it an instant hit with developers, and brought the original developer worldwide fame.
jQuery
jQuery is a JavaScript library for performing DOM manipulation and analysis. Born from dissatisfaction with the APIs of 2005, jQuery was designed to get a lot done with very little.
The API has a small surface area — small enough to fit onto a single cheat sheet. Yet it's powerful enough to construct everything from simple one-page personal websites to complex Web 2.0 applications.
jQuery doesn't have users. It has fans, or should I say, zealots. It's a cranberry-orange scone with light sugar glazing, baked to a golden brown and served with a piping hot cup of dark roasted coffee. That's how good it is.
jQuery versus W3 DOM
Let's say we want to insert a 'div' element before every first paragraph beneath all h1 tags. In jQuery, this is trivial:
$('h1 p:first').prepend('<div class="sep"></div>')
Using the W3 interface for the DOM, the code is bulky and intrusive, obscuring its purpose with its own unwieldiness. In fact, you're going to have to take my word for this, because I'm in no mood to write the parallel W3 DOM code for the above right now (readers: this is your cue!).
Here's a more complicated example courtesy of Brian Reindel that counts down the number of characters the user is allowed to type (like Twitter):
var countdown = {
init: function() {
countdown.remaining = countdown.max - $(countdown.obj).val().length;
if (countdown.remaining > countdown.max) {
$(countdown.obj).val($(countdown.obj).val().substring(0,countdown.max));
}
$(countdown.obj).siblings(".remaining").html(countdown.remaining + " characters remaining.");
},
max: null,
remaining: null,
obj: null
};
$(".countdown").each(function() {
$(this).focus(function() {
var c = $(this).attr("class");
countdown.max = parseInt(c.match(/limit_[0-9]{1,}_/)[0].match(/[0-9]{1,}/)[0]);
countdown.obj = this;
iCount = setInterval(countdown.init,1000);
}).blur(function() {
countdown.init();
clearInterval(iCount);
});
});
Coding the above with bare metal JavaScript would take many times the amount of code.
Clean and simple, there's a reason jQuery is #1 when it comes to lightweight JavaScript libraries. But what is this reason?
Power/Conciseness
Good APIs are very powerful, but not overwhelmingly complicated. They have a small surface area — or at least, small enough so that a single developer can acquaint herself with the whole API without much effort.
Yet, there is a tension between simplicity and power. If your API exposes exactly 1 function, then the API can't do very much (assuming the function doesn't take an enormous number of parameters). On the other hand, if your API exposes 10 million functions, it can probably do a lot, but no one will ever use it.
As API designers, we want both: simplicity and power. How do we get it? In a word, composability.
Composability
Many good APIs achieve both power and conciseness by being composable. Sometimes, this property is called by another name, like "functional-style" or "domain-specific language". But the heart is always the same: the API provides the client with Lego-like building blocks that can be rearranged and snapped together to solve new problems.
Good APIs go beyond just being composable to offering high usability and covering common use cases — topics I'll return to at the end of this post. But composability is how they achieve richness while preserving comprehensibility.
In many poorly-designed APIs, each intention of the client is mapped to a separate method invocation. To illustrate, consider an API for comparing strings.
Certainly, the client will want a method for seeing if two strings are exactly equal, so we can start with a method equalsExactly to satisfy that need. The client will also want a method to see if two strings are equal ignoring case, so we can add another method equalsIgnoreCase. The user may also want to see if one string is "approximately equal" to another, for purposes of correcting spelling mistakes, so we can add another method equalsApproximately. And so forth, until we have compiled a large collection of methods.
The problem with this approach is that it greatly increases the surface area of our API. To use our API effectively, the client needs to know about all 20 of our functions for equality. Many of these methods are not used frequently, so looking at them will slow the client down ("brain clutter").
The problem gets worse as the API gets larger. Before long, you have the monolithic framework so common in Java, which consists of hundreds of classes, each with dozens of methods. Bulky, awkward, joy-killing, and brain-draining, such behemoths have inspired fear and dread in many a developer, rather than the delight that jQuery inspires.
To help us inspire delight in our clients, let's take a look at the ingredients that make up jQuery's composability.
Ingredients of Composability
Concisely, composability refers to a factoring of an API such that clients can construct solutions to their specific problems by assembling solutions to smaller problems.
jQuery exposes two kinds of building blocks: CSS selectors, and the jQuery abstraction itself.
Look closely at the following expression:
$('#foo input[type="text"]').hide();
This code hides all text input files located beneath the element with id 'foo'.
At first glance, you might not see the building blocks of which I speak. Look closer, however, and you'll see them staring right back at you. Where? Inside the string passed to the $ function! That's right, the first place we see building blocks in jQuery is in the selector string passed to jQuery's selector engine.
By combining different selectors, clients of the jQuery API can select anything in the DOM, with minimal effort. In the W3 DOM (prior to the recently introduced Selectors API, which would not exist if it weren't for jQuery and similar libraries), if you wanted to find all text input fields located under the element with id 'foo', you'd be looking at quite a few lines of code. In jQuery, you're looking at a few characters in a string.
The W3 API provides a getElementsByTagName() function, a highly-specific function for retrieving all elements by tag name. This function can't be combined with any others or used for any other purpose. Meanwhile, jQuery allows you to get elements by tag name ($('foo')), but it allows you to combine the tag name condition with an arbitrary number of other conditions, so you can find the exact elements you are interested in ($('foo > bar .page a:first')).
See the difference? One-shot methods that solve a single problem for you, versus building blocks that allow you to solve arbitrarily complex problems.
The second level of composability in jQuery is the jQuery object itself. jQuery allows a fluid, method chaining style of programming, where sets of elements can be refined or transformed to other sets of elements:
$('.foo').parent().parent().find('.priority').click(
function() { $(this).addClass('high'); }
);
The above snippet of functionality could have been implemented in a single method — one of perhaps a thousand or more. Instead, jQuery allows us to assemble our own solutions to problems by snapping together the blocks of functionality that it provides.
Composable AND Usable
Composable APIs have the potential to be easy to use. In particular, because they have smaller surface area, they are easier to master. Instead of memorizing lots of classes and methods, you memorize a few building blocks, which you use to solve all the problems you encounter.
That doesn't mean composable APIs are automatically user-friendly. Unless you watch out, your composable API may end up suffering from one of these two common issues:
- They glue for wiring building blocks together is bulky, requiring repetitious boilerplate.
- Common use cases must be wired together manually.
Let's look at one composable API from Java that falls prey to both pitfalls.
Java File IO API
PHP has dead simple functions for reading and writing the contents of a file: filegetcontents($file) and fileputcontents($file, $contents). These methods are a newbie's paradise: there's no confusion over what these functions do or how you use them. However, that simplicity comes with a price: the functions can't read or write data incrementally, they can't be used for reading from or writing to a socket, they don't offer conditional buffering, etc. PHP has other functions for those use cases.
Java takes a different approach. It provides core interfaces for input and output called InputStream and OutputStream, respectively. There are implementations for sockets and others for files. Then there are decorators (some of which ship with Java, others of which are third-party), which add chunks of functionality like buffering, auto-flushing, text reading/writing, and so forth.
The Java API is composable, and as a result, very flexible — you can use the same interfaces and building blocks to solve just about any IO problem. However, because the glue between building blocks is unwieldy, and the common use cases have no shortcut, the API is frequently a source of dread and confusion for developers new to the language.
Let's look at some Java code that efficiently reads a file into a buffer:
ByteArrayOutputStream out = new ByteArrayOutputStream();
InputStream in = new BufferedInputStream(new FileInputStream(file));
byte[] buffer = new byte[1024];
int count = 0;
while ((count = in.read(buffer) >= 0) {
out.write(b, 0, count);
}
in.close();
byte[] bytes = out.toByteArray();
Compare this to filegetcontents(). Which do you prefer?
The Java API is composable but it's a pain to use, because it's awkward to wire together building blocks (new BufferedInputStream(new FileInputStream(file)) — and separate imports!), and shortcuts for common use cases are not provided for in the API.
We can do better. We can get both composability and user-friendliness. The next section shows the way.
A Better Java File IO API
We need to trim the glue and provide for common use cases. Here's one way we can accomplish these goals:
- Build in common stream decorators into the base classes, along with a simple mechanism for adding more decorators (this will simplify wiring and cut down on boilerplate);
- Add some convenience methods that make common tasks easy.
For (1), I'd suggest extensions along the following lines:
public abstract class InputStream {
...
public T with(Class decoratorClass) {
// Use reflection to create decorator class and return it:
...
}
public BufferedInputStream buffered() {
return with(BufferedInputStream.class);
}
public AutoFlushingInputStream autoFlushing() {
return with(AutoFlushingInputStream.class);
}
public AutoClosingInputStream autoClosing() {
return with(AutoClosingInputStream.class);
}
...
}
This change lets us string together common combinations without much effort:
InputStream logStream = new FileInputStream(logFile)
.buffered()
.autoClosing();
Or even add our own decorators:
InputStream logStream = new FileInputStream(logFile)
.buffered()
.autoClosing()
.with(NewlineConvertingInputStream.class);
Similarly for OutputStream. To deal with (2), I propose adding helper methods to File, InputStream, and OutputStream. Something along these lines:
- File.getContents: Returns the contents of the file as a byte array.
- File.getContentsString: Returns the contents of the file as a string.
- File.putContents: Updates the contents of the file as a byte array.
- File.putContentsString: Updates the contents of the file as a string.
- File.newInput: Returns a new input stream.
- File.newOutput: Returns a new output stream.
- File.newIO: Returns a new random access stream.
- InputStream.readLine: Reads data until the first newline or EOF, returns null if no more data is available.
- InputStream.readAll: Returns a byte array containing the contents of the file, and automatically closes the file.
- InputStream.readAllString: Returns the contents of the file as a string, and automatically closes the file.
- OutputStream.writeLine: Writes a line of data to the file.
Note that I do not advise mixing file abstractions with file system abstractions, but Java already does this, so we may as well continue the pattern.
With these changes, reading a file into memory is now trivial:
byte[] contents = file.newInput().buffered().readAll(); // OR byte[] contents = file.getContents();
In fact, most operations are trivial and can be expressed in a few lines of code.
In some sense, our API is less "pure", as it combines into a single API functions that are not applicable to all use cases (e.g. InputStream.readLine doesn't make sense for binary files), and even violates a common design principle by making a superclass depend on concrete subclasses (however, it does so in a harmless way, because the superclass has no knowledge and does not depend on the implementation details of the subclasses).
Sacrificing some purity for higher usability is a common pattern in composable, highly-usable APIs. For example, jQuery always exposes all methods, even when they don't make sense on the underlying jQuery set (e.g. most DOM elements don't have a value, but jQuery still exposes the value() method), and all jQuery getter methods operate on the first element of the set (which is impure, but turns out to greatly simplify client code).
hParse
I want to leave you with a real-life example of a composable, user-friendly API. This is a library called hParse written for the Haxe programming language. The library enables developers to quickly write parsers for text files.
Rather than documenting the library itself, I'll just show you all the code necessary to parse JSON objects using the hParse API:
package hParse.grammars;
import hParse.HParse;
class JSON extends Grammar {
public static var JSON_LITERAL:String = "";
public static var JSON_STRING :String = "";
public static var JSON_NUMBER :String = "";
public static var JSON_OBJECT :String = "";
public static var JSON_ARRAY :String = "";
public static var JSON_VALUE :String = "";
public function new(name:String = "") {
super(name);
var lcurly = token("{");
var rcurly = token("}");
var lbracket = token("[");
var rbracket = token("]");
var colon = token(":");
var trueT = token("true");
var falseT = token("false");
var nullT = token("null");
var comma = token(",");
var literalP = symbol(
JSON_LITERAL,
trueT.orElse(falseT).orElse(nullT)
);
var string = symbol(
JSON_STRING,
stringLiteralD()
);
var number = symbol(
JSON_NUMBER,
float()
);
var object = symbol(JSON_OBJECT);
var array = symbol(JSON_ARRAY);
var value = symbol(
JSON_VALUE,
string
.orElse(number)
.orElse(literalP)
.orElse(object)
.orElse(array)
);
var pair = string.then(colon).then(value);
var memberList = (pair.then (repeatP(comma.then(pair), 0)).orElse(empty());
var valueList = (value.then(repeatP(comma.then(value), 0)).orElse(empty());
object.bindTo(lcurly.then(memberList).then(rcurly));
array.bindTo(lbracket.then(valueList).then(rbracket));
start = value;
}
}
If you know anything about parsers and the JSON format, then you'll be able to quickly understand what the above does.
The building blocks the library provides are simple parsers that can be easily combined to form more complicated parsers. And the API is so composable, the JSON grammar class above is actually a parser (a grammar extends a parser), which means grammars can themselves be embedded in other grammars.
The hParse library could have been designed to provide vast collections of methods that parse strings, numbers, bracketed regions, and so forth. But not only would this have greatly expanded the surface area of the API, it would have made use of the API much more difficult. Instead, hParse provides Lego-like building blocks that can be snapped together to solve just about any parsing problem you're likely to run into.
See if you can spot the coverage of common use cases in hParse.
Summary
In this post, I hope I've convinced you that you can make a powerful API that's simple to use by embracing composability and covering common use cases. This simple recipe is behind many of today's more successful APIs (although keep in mind, not all APIs lend themselves to composability).
As an exercise, I suggest taking an API of your choice, and trying to recast part of it in a composable way, supplying core building blocks that clients can use to solve their own problems.
In the next part in this series, I'll cover how some simple conventions can help make your API more memorable and easier to use. Until then...