Navigation
Journal Archive
Monday
Sep282009

Good API Design: Part 4

In Part 3 of this series, I outlined how API designers can use the type system to make APIs hard to misuse. In this fourth installment, I'll talk about ways to make APIs comprehensible and digestible, while still retaining power.

To motivate this discussion, I'll draw your attention to a real-life library whose design made it an instant hit with developers, and brought the original developer worldwide fame.

jQuery

jQuery is a JavaScript library for performing DOM manipulation and analysis. Born from dissatisfaction with the APIs of 2005, jQuery was designed to get a lot done with very little.

The API has a small surface area — small enough to fit onto a single cheat sheet. Yet it's powerful enough to construct everything from simple one-page personal websites to complex Web 2.0 applications.

jQuery doesn't have users. It has fans, or should I say, zealots. It's a cranberry-orange scone with light sugar glazing, baked to a golden brown and served with a piping hot cup of dark roasted coffee. That's how good it is.

jQuery versus W3 DOM

Let's say we want to insert a 'div' element before every first paragraph beneath all h1 tags. In jQuery, this is trivial:

    $('h1 p:first').prepend('<div class="sep"></div>')

Using the W3 interface for the DOM, the code is bulky and intrusive, obscuring its purpose with its own unwieldiness. In fact, you're going to have to take my word for this, because I'm in no mood to write the parallel W3 DOM code for the above right now (readers: this is your cue!).

Here's a more complicated example courtesy of Brian Reindel that counts down the number of characters the user is allowed to type (like Twitter):

var countdown = {
    init: function() {
        countdown.remaining = countdown.max - $(countdown.obj).val().length;
        if (countdown.remaining > countdown.max) {
            $(countdown.obj).val($(countdown.obj).val().substring(0,countdown.max));
        }
        $(countdown.obj).siblings(".remaining").html(countdown.remaining + " characters remaining.");
    },
    max: null,
    remaining: null,
    obj: null
};
$(".countdown").each(function() {
    $(this).focus(function() {
        var c = $(this).attr("class");
        countdown.max = parseInt(c.match(/limit_[0-9]{1,}_/)[0].match(/[0-9]{1,}/)[0]);
        countdown.obj = this;
        iCount = setInterval(countdown.init,1000);
    }).blur(function() {
        countdown.init();
        clearInterval(iCount);
    });
});

Coding the above with bare metal JavaScript would take many times the amount of code.

Clean and simple, there's a reason jQuery is #1 when it comes to lightweight JavaScript libraries. But what is this reason?

Power/Conciseness

Good APIs are very powerful, but not overwhelmingly complicated. They have a small surface area — or at least, small enough so that a single developer can acquaint herself with the whole API without much effort.

Yet, there is a tension between simplicity and power. If your API exposes exactly 1 function, then the API can't do very much (assuming the function doesn't take an enormous number of parameters). On the other hand, if your API exposes 10 million functions, it can probably do a lot, but no one will ever use it.

As API designers, we want both: simplicity and power. How do we get it? In a word, composability.

Composability

Many good APIs achieve both power and conciseness by being composable. Sometimes, this property is called by another name, like "functional-style" or "domain-specific language". But the heart is always the same: the API provides the client with Lego-like building blocks that can be rearranged and snapped together to solve new problems.

Good APIs go beyond just being composable to offering high usability and covering common use cases — topics I'll return to at the end of this post. But composability is how they achieve richness while preserving comprehensibility.

In many poorly-designed APIs, each intention of the client is mapped to a separate method invocation. To illustrate, consider an API for comparing strings.

Certainly, the client will want a method for seeing if two strings are exactly equal, so we can start with a method equalsExactly to satisfy that need. The client will also want a method to see if two strings are equal ignoring case, so we can add another method equalsIgnoreCase. The user may also want to see if one string is "approximately equal" to another, for purposes of correcting spelling mistakes, so we can add another method equalsApproximately. And so forth, until we have compiled a large collection of methods.

The problem with this approach is that it greatly increases the surface area of our API. To use our API effectively, the client needs to know about all 20 of our functions for equality. Many of these methods are not used frequently, so looking at them will slow the client down ("brain clutter").

The problem gets worse as the API gets larger. Before long, you have the monolithic framework so common in Java, which consists of hundreds of classes, each with dozens of methods. Bulky, awkward, joy-killing, and brain-draining, such behemoths have inspired fear and dread in many a developer, rather than the delight that jQuery inspires.

To help us inspire delight in our clients, let's take a look at the ingredients that make up jQuery's composability.

Ingredients of Composability

Concisely, composability refers to a factoring of an API such that clients can construct solutions to their specific problems by assembling solutions to smaller problems.

jQuery exposes two kinds of building blocks: CSS selectors, and the jQuery abstraction itself.

Look closely at the following expression:

$('#foo input[type="text"]').hide();

This code hides all text input files located beneath the element with id 'foo'.

At first glance, you might not see the building blocks of which I speak. Look closer, however, and you'll see them staring right back at you. Where? Inside the string passed to the $ function! That's right, the first place we see building blocks in jQuery is in the selector string passed to jQuery's selector engine.

By combining different selectors, clients of the jQuery API can select anything in the DOM, with minimal effort. In the W3 DOM (prior to the recently introduced Selectors API, which would not exist if it weren't for jQuery and similar libraries), if you wanted to find all text input fields located under the element with id 'foo', you'd be looking at quite a few lines of code. In jQuery, you're looking at a few characters in a string.

The W3 API provides a getElementsByTagName() function, a highly-specific function for retrieving all elements by tag name. This function can't be combined with any others or used for any other purpose. Meanwhile, jQuery allows you to get elements by tag name ($('foo')), but it allows you to combine the tag name condition with an arbitrary number of other conditions, so you can find the exact elements you are interested in ($('foo > bar .page a:first')).

See the difference? One-shot methods that solve a single problem for you, versus building blocks that allow you to solve arbitrarily complex problems.

The second level of composability in jQuery is the jQuery object itself. jQuery allows a fluid, method chaining style of programming, where sets of elements can be refined or transformed to other sets of elements:

    $('.foo').parent().parent().find('.priority').click(
        function() { $(this).addClass('high'); }
    );

The above snippet of functionality could have been implemented in a single method — one of perhaps a thousand or more. Instead, jQuery allows us to assemble our own solutions to problems by snapping together the blocks of functionality that it provides.

Composable AND Usable

Composable APIs have the potential to be easy to use. In particular, because they have smaller surface area, they are easier to master. Instead of memorizing lots of classes and methods, you memorize a few building blocks, which you use to solve all the problems you encounter.

That doesn't mean composable APIs are automatically user-friendly. Unless you watch out, your composable API may end up suffering from one of these two common issues:

  1. They glue for wiring building blocks together is bulky, requiring repetitious boilerplate.
  2. Common use cases must be wired together manually.

Let's look at one composable API from Java that falls prey to both pitfalls.

Java File IO API

PHP has dead simple functions for reading and writing the contents of a file: filegetcontents($file) and fileputcontents($file, $contents). These methods are a newbie's paradise: there's no confusion over what these functions do or how you use them. However, that simplicity comes with a price: the functions can't read or write data incrementally, they can't be used for reading from or writing to a socket, they don't offer conditional buffering, etc. PHP has other functions for those use cases.

Java takes a different approach. It provides core interfaces for input and output called InputStream and OutputStream, respectively. There are implementations for sockets and others for files. Then there are decorators (some of which ship with Java, others of which are third-party), which add chunks of functionality like buffering, auto-flushing, text reading/writing, and so forth.

The Java API is composable, and as a result, very flexible — you can use the same interfaces and building blocks to solve just about any IO problem. However, because the glue between building blocks is unwieldy, and the common use cases have no shortcut, the API is frequently a source of dread and confusion for developers new to the language.

Let's look at some Java code that efficiently reads a file into a buffer:

ByteArrayOutputStream out = new ByteArrayOutputStream();
InputStream in = new BufferedInputStream(new FileInputStream(file));  
byte[] buffer = new byte[1024];
int count = 0;
while ((count = in.read(buffer) >= 0) {  
    out.write(b, 0, count);
}  
in.close();  
byte[] bytes = out.toByteArray();

Compare this to filegetcontents(). Which do you prefer?

The Java API is composable but it's a pain to use, because it's awkward to wire together building blocks (new BufferedInputStream(new FileInputStream(file)) — and separate imports!), and shortcuts for common use cases are not provided for in the API.

We can do better. We can get both composability and user-friendliness. The next section shows the way.

A Better Java File IO API

We need to trim the glue and provide for common use cases. Here's one way we can accomplish these goals:

  1. Build in common stream decorators into the base classes, along with a simple mechanism for adding more decorators (this will simplify wiring and cut down on boilerplate);
  2. Add some convenience methods that make common tasks easy.

For (1), I'd suggest extensions along the following lines:

public abstract class InputStream {
    ...
    public  T with(Class decoratorClass) {
        // Use reflection to create decorator class and return it:
        ...
    }

    public BufferedInputStream buffered() {
        return with(BufferedInputStream.class);
    }

    public AutoFlushingInputStream autoFlushing() {
        return with(AutoFlushingInputStream.class);
    }

    public AutoClosingInputStream autoClosing() {
        return with(AutoClosingInputStream.class);
    }
    ...
}

This change lets us string together common combinations without much effort:

    InputStream logStream = new FileInputStream(logFile)
        .buffered()
        .autoClosing();

Or even add our own decorators:

    InputStream logStream = new FileInputStream(logFile)
        .buffered()
        .autoClosing()
        .with(NewlineConvertingInputStream.class);

Similarly for OutputStream. To deal with (2), I propose adding helper methods to File, InputStream, and OutputStream. Something along these lines:

  • File.getContents: Returns the contents of the file as a byte array.
  • File.getContentsString: Returns the contents of the file as a string.
  • File.putContents: Updates the contents of the file as a byte array.
  • File.putContentsString: Updates the contents of the file as a string.
  • File.newInput: Returns a new input stream.
  • File.newOutput: Returns a new output stream.
  • File.newIO: Returns a new random access stream.
  • InputStream.readLine: Reads data until the first newline or EOF, returns null if no more data is available.
  • InputStream.readAll: Returns a byte array containing the contents of the file, and automatically closes the file.
    • InputStream.readAllString: Returns the contents of the file as a string, and automatically closes the file.
  • OutputStream.writeLine: Writes a line of data to the file.

Note that I do not advise mixing file abstractions with file system abstractions, but Java already does this, so we may as well continue the pattern.

With these changes, reading a file into memory is now trivial:

byte[] contents = file.newInput().buffered().readAll();
// OR
byte[] contents = file.getContents();

In fact, most operations are trivial and can be expressed in a few lines of code.

In some sense, our API is less "pure", as it combines into a single API functions that are not applicable to all use cases (e.g. InputStream.readLine doesn't make sense for binary files), and even violates a common design principle by making a superclass depend on concrete subclasses (however, it does so in a harmless way, because the superclass has no knowledge and does not depend on the implementation details of the subclasses).

Sacrificing some purity for higher usability is a common pattern in composable, highly-usable APIs. For example, jQuery always exposes all methods, even when they don't make sense on the underlying jQuery set (e.g. most DOM elements don't have a value, but jQuery still exposes the value() method), and all jQuery getter methods operate on the first element of the set (which is impure, but turns out to greatly simplify client code).

hParse

I want to leave you with a real-life example of a composable, user-friendly API. This is a library called hParse written for the Haxe programming language. The library enables developers to quickly write parsers for text files.

Rather than documenting the library itself, I'll just show you all the code necessary to parse JSON objects using the hParse API:

package hParse.grammars;

import hParse.HParse;

class JSON extends Grammar {
 public static var JSON_LITERAL:String = "";
 public static var JSON_STRING :String = "";
 public static var JSON_NUMBER :String = "";
 public static var JSON_OBJECT :String = "";
 public static var JSON_ARRAY  :String = "";
 public static var JSON_VALUE  :String = "";

 public function new(name:String = "") {
  super(name);

  var lcurly   = token("{");
  var rcurly   = token("}");
  var lbracket = token("[");
  var rbracket = token("]");
  var colon    = token(":");
  var trueT    = token("true");
  var falseT   = token("false");
  var nullT    = token("null");
  var comma    = token(",");

  var literalP = symbol(
                   JSON_LITERAL, 
                   trueT.orElse(falseT).orElse(nullT)
                 );     
  var string   = symbol(
                   JSON_STRING,
                   stringLiteralD()
                 );
  var number   = symbol(
                   JSON_NUMBER,
                   float()
                 );
  var object   = symbol(JSON_OBJECT);
  var array    = symbol(JSON_ARRAY);
  var value    = symbol(
                   JSON_VALUE,  
                   string
                   .orElse(number)
                   .orElse(literalP)
                   .orElse(object)
                   .orElse(array)
                 );     

  var pair     = string.then(colon).then(value);

  var memberList = (pair.then (repeatP(comma.then(pair),  0)).orElse(empty());
  var valueList  = (value.then(repeatP(comma.then(value), 0)).orElse(empty());

  object.bindTo(lcurly.then(memberList).then(rcurly));
  array.bindTo(lbracket.then(valueList).then(rbracket));

  start = value;
 }
}

If you know anything about parsers and the JSON format, then you'll be able to quickly understand what the above does.

The building blocks the library provides are simple parsers that can be easily combined to form more complicated parsers. And the API is so composable, the JSON grammar class above is actually a parser (a grammar extends a parser), which means grammars can themselves be embedded in other grammars.

The hParse library could have been designed to provide vast collections of methods that parse strings, numbers, bracketed regions, and so forth. But not only would this have greatly expanded the surface area of the API, it would have made use of the API much more difficult. Instead, hParse provides Lego-like building blocks that can be snapped together to solve just about any parsing problem you're likely to run into.

See if you can spot the coverage of common use cases in hParse.

Summary

In this post, I hope I've convinced you that you can make a powerful API that's simple to use by embracing composability and covering common use cases. This simple recipe is behind many of today's more successful APIs (although keep in mind, not all APIs lend themselves to composability).

As an exercise, I suggest taking an API of your choice, and trying to recast part of it in a composable way, supplying core building blocks that clients can use to solve their own problems.

In the next part in this series, I'll cover how some simple conventions can help make your API more memorable and easier to use. Until then...

Monday
Jun222009

Google Gets Real-Time Collaborative Editing

With the recent announcement of Google Wave, the search engine giant has brought real-time collaborative editing into the limelight.

Companies like mine are meanwhile rejoicing, because it means that mainstream has finally acknowledged what we've been saying all along: that the days are numbered for the crude, bulky, invasive forms of collaboration that exist today.

In the near future, people will not send files to each other across e-mail for commentary and editing. They won't take turns editing Wiki documents online. They won't have to do esoteric operations like "commit" and "update" in order to see what others are up to.

Rather, sooner than you think, the world will make the transition to live documents. Edit these documents as you like. If someone else is editing them, you'll see what they're doing in real-time. Comment, chat, collaborate, whatever — without rules and without boundaries. The technology will disappear into the background, and let people focus on creating, editing, and analyzing content together, in a way never before possible.

N-Brain is ideally positioned for this brave new world, because of the technical strength of our product offering.

In their recent presentation, Google dismissed peer-to-peer real-time collaboration as a subject for academic inquiry, and lauded their own client-server implementation for scalability.

Well, it just so happens that peer-to-peer, real-time collaborative editing of any structured media (including rich-text documents) exists today. That technology is called Una Merge, and it scales better than any server implementation possibly could — precisely because no server is required (the burden of merging is pushed onto the client, where resources are abundant and free).

Exciting times ahead for companies like ours. And scary times for some companies, because wikis and online documents as we know them today are nearing the end of their lives. They're rebirth as live documents will usher in a new era of collaboration for knowledge workers of all kinds.

Sunday
Jun212009

The Life & Times of Tech Companies

Fresh and starving for cash, a start-up is obsessed with finding a market and pleasing whoever throws them a bone.

You do a lot to please your audience. You give them free product. You implement every change they ask for. You experiment wildly, desperately trying to find something, anything, that people are willing to pay for.

If you're one of the lucky few, you see some growth. No matter how much you grow, however, you can't get ahead. You notice that your costs are linear (or worse) with your output. That's because you spend a lot of money on labor, and when you think about it, you're really a service company. Service companies don't make it big. They just don't scale.

And for that reason, you push yourself to enter the next phase: the growth phase.

You can't grow unless you can scale. And you can't scale if you're a service company. So you start categorizing your services. You look for patterns harder than you ever have before. You start defining products (if you already had products, you tighten their definitions). You put a lot of resources behind cranking out the handful of products that will cover 80% of your market. You work hard to squeeze out every ounce of efficiency you can from the processes that create those products.

You have to give up a lot. You can't do customizations anymore. Sorry. Throw away those bells and whistles, too — the process doesn't allow for bells and whistles. No more crazy experimentation trying to find an audience. In fact, you have to ignore some customers, and say goodbye to others. And with all of these changes, you make the transition from being a service company, to being a product company.

Now you sell products. Your processes are efficient, maybe even lean efficient. You can reduce the cost of your products to levels you didn't dream of back when you were a service company. Your market expands because others are attracted by how much they can get for their dollar. You're on track to become a major player in your industry.

With a growing customer base comes growing responsibility. With more responsibility comes less flexibility. Your customers depend on you for what you provide them with. When you were small, you could afford to experiment and change things at whim for every customer. You can't do that anymore. The processes for change become bulky and time consuming, both because you are so large, and because the customers need protection from wild changes. Innovation slows, as you continue pursuing the path that led you to where you are today.

Innovation, that missing ingredient in your company, is abundant at smaller companies. They don't have efficient processes, they don't produce model T's, but they can do in hours what would take months at your company.

The threat is very real. These companies evolve quickly. They explore the landscape of consumer preferences much faster than you can.

If you're blinded by your own success, you head down a path that leads to your extinction (see the numerous mainframe companies that no longer exist today). If you're aware of your vulnerability to so-called "disruptive innovations", and you have the cash, then you just start buying the smaller guys whenever you see promise (see Microsoft and Google). If you're really at the top of your game and not afraid to experiment (read: take risks and waste money, because that's what experimentation requires), you do your own evolution in house, but using the very best talent you can get your hands on (see Apple).

And so it comes full circle. The little company grows to a big company that is either undone by a little company, or tries to become more like a little company (while still retaining the size of market and production capacity of a large company).

What stage is your company in?

Monday
May112009

Good API Design: Part 3

In the last post, I gave an example of a good API and a bad one, showing how knowledge of the principles of good API design, combined with a deep understanding of the problem that clients are trying to solve, can be used to build stellar APIs. Beginning with this third entry in the Good API Design series, I'm going to look at a number of different techniques you can use to improve the quality of your APIs.

Our first stop in this journey: using the type system to make APIs difficult to misuse.

This particular technique is most valuable in statically-typed languages, but with some effort, even dynamic programming languages can benefit (albeit to a much lesser extent).

Use and Abuse

Misuse of an API occurs when a developer does not follow the rules that are encoded into the API. For example, a developer may pass null to a method that does not expect such a value; a developer may invoke a method before invoking some prerequisite method; a developer might call a method expecting it to do something it doesn't.

You can reduce misuse by reducing rules, by simplifying them, by making them conform to the expectations of clients, and so forth — all of which are good topics for future posts. But another extremely powerful way to reduce misuse is to make the rules harder to break.

How? By bringing in the strong-arm of the compiler — or, more specifically, the strong-arm of the type checker (be that a compile time type checker, or a runtime type checker).

A few examples will motivate the discussion.

Joel on Hungarian Notation

Joel Spolsky once blogged on the subject of Hungarian notation (not the more common kind put forward by Microsoft in days gone by, but a semantic variant). He advocated adopting naming conventions as a way of training your mind to spot defective code (if you have the time, read the article before proceeding).

As an API designer, using naming conventions to reduce misuse of your API should fall damn near the bottom of your list of tricks and techniques. Which doesn't mean names are irrelevant — quite the contrary. But they provide just a small benefit compared to their cost, so you'll want to exhaust more powerful techniques first.

Let's grab three examples from Joel's post:

  1. Data that comes from the user can't be trusted. One way to increase security is to tag variables containing tainted (user-supplied) string data with "us", for "unsafe "string", and tag variables containing safe string data with "s", for "safe" string.
  2. In an application that does tabular manipulation (such as a spreadsheet), any mathematical expression that combines rows and columns is likely to be defective. One way to help spot these defective expressions is to tag integer variables that represent rows with "rw", and tag integer variables that represent columns with "col".
  3. In a 2D scene graph, such as a windowing API, coordinates of a node are either relative to the container, or absolute (i.e. relative to the root container, such as the display, or an application's main window). Mathematical expressions that combines relative and absolute coordinates are likely to be wrong. Tagging relative and absolute coordinates can help developers spot such expressions.

These are real problems, and as API designers, we can address them without politely asking the users of our APIs to adopt certain naming conventions.

We just need a little help from the type system.

Better Security through Typing

Let's tackle the first example. Say we're designing an API for web apps, or maybe just a layer on top of such a library. We want to allow clients access to POST and GET data, but we want to mark it as tainted, and prevent certain operations with it until it's been converted into a form that's safe from injection attacks (for example, by HTML escaping the data, so it can be embedded in a web page).

If we model an HTTP request with an RequestHTTP object, then we might have a method called data() that returns a Map of key/value pairs. A naive first approach might define this method as follows:

    public Map<String, String> data() { ... }

The problem with this approach is that we don't offer any safety to the client. They can obtain access to the raw data and do anything they like with it: including, for example, inserting it directly into a database, which can lead to security breaches (exactly how is not important for this post — Google if you're interested in the details).

Moreover, we force them to manage all aspects of security. Consequently, our API is easy to misuse, not to mention a real pain.

Rather than rename this data method "usData", for "unsafe string data", we'll instead define a new abstraction: UnsafeString. Our data method will then be defined as follows:

    public Map<String, UnsafeString> data() { ... }

(There is no need to make the key an UnsafeString, because clients must make a conscious decision to pull a value out of the map based on a predefined key.)

With this change, we still allow the user to obtain access to each key's value, but because we're using our own abstraction for values, we get to decide how that data may be used.

In particular, we will not provide a way to extract the string value of an UnsafeString without forcing the user to specify the intended semantic. Our knowlege of the semantic will allow us to take appropriate safety precautions.

Data might be formatted for HTML, or escaped for insertion into a database via SQL. Thus, our interface for UnsafeString might look something like this:

    public final class UnsafeString {
        public String toHtml() { ... }
        public String toSql() { ... }
        public String toJson() { ... }
        ...
    }

with other string conversion methods for different semantics.

If toHtml() is invoked, then we'll HTML encode the string, and return the encoded string. If toSql() is invoked, then we'll do appropriate escaping for embedding the data inside a statement. And so forth. This means clients never have to worry about the safety of user-supplied data. Rather, clients specify the intended use (we force them to, because there's no other way to access the string data), and using this information, we take the appropriate security precautions.

The above is a good first start, but it's not foolproof. Let's say the user stores the string in the database. We prevented SQL-injection, but what happens when the user reads the data? It will be read via the database API, which will read it as an ordinary string. As an ordinary string, the user might do lots of unsafe things with it.

Such problems often occur at the junction of our APIs with the rest of the world. Unless we're designing the whole web app ecosystem, including the persistence layer, we have to throw away some abstraction so clients can interface our API with the outside world.

In this example, the toSql() semantic leaks information because it allows persistence of the data in the original format. We could accept this limitation, but we don't have to. There are a number of choices available to us:

  1. We could remove the toSql semantic and any other ones that preserve user-supplied data. This would prevent storage of user data in the original format. However, as data is never intended solely for persistence, but always for publication in some other medium (HTML, XML, user-interface), this would not be as limiting as one might think. The real difficulty is that it would no longer be easy to obtain multiple representations of raw user data (e.g. plain text for sending in a text e-mail, HTML for sending in an HTML e-mail, and so on).
  2. We could encode the user-supplied data for semantics like toSql (for example, using some form of encryption or text scrambling), and provide a constructor or static method to reconstruct an UnsafeString from such encoded data. This is the hard-line approach to the problem, which completely prevents clients from doing anything useful (or harmful!) with user-supplied data, unless they go through the UnsafeString interface. Depending on our encoding method, side effects of this approach could include negative effects on SQL search queries, increases in the size of the database, and a loss of database transparency.

None of these solutions is ideal. And each solution is probably the "best" solution for some target audience. Clients extremely concerned with security, or who don't want to have to deal with security issues but still want a secure application, would probably be better off with the hard-line approach. But middle-of-the-road clients who are just looking for a slight boost to security without changing the way they do things now would probably prefer the soft approach.

Since an optimal solution is not possible, you just need to find the best tradeoffs for your target audience. A common theme in API design.

In any case, the use of the type system to prevent misuse of raw data from the user allows clients to focus more on the logic of their application, and less on the mundane details that should be abstracted away from them.

Better Tables through Typing

In developing a spreadhseet application, or any application that allows editing and manipulation of tabular data, it's often necessary to refer to particular rows or columns, and to use them in basic mathematical expressions:

    table.insertRowAt(4);
    ...
    int copyRowCount = bottomRow - topRow + 1;
    int copyColCount = rightCol - leftCol + 1;

    Table subtable = table.copy(topRow, leftCol, copyRowCount, copyColCount);

Using integers to represent rows and columns poses a number of problems:

  1. Clients may mix expressions involving rows and columns, which will likely produce nonsensical results.
  2. Clients may pass a row parameter when a column parameter is expected, or visa versa.

So if we use integers to represent rows and columns, then our API is wide open to obvious (and common) forms of misuse. This misuse increases the defect rate in client code, and makes it costly to develop and maintain software that uses our API.

You're probably already imagining the solution: abandon integers, and use separate abstractions for rows and columns.

Indeed, we can create an immutable Col class to represent a column, and an immutable Row class to represent a row.

This approach is not particularly pleasant in Java, but it can be done:

public abstract class TabularPosition<T extends TabularPosition<T>> {
    public abstract T plus(T in);
    public abstract T minus(T in);
    public abstract T next();
    public abstract T previous();
    ...
}

public final class Row extends TabularPosition<Row> {
    ...
}

public final class Col extends TabularPosition<Col> {
    ....
}

Then we could define a copy() method on Table like this:

    Table copy(Row topRow, Col leftCol, Row bottomRow, Col rightCol) { ... }

Clients would obtain instances to these classes by, for example, calling Table.getStartCol(), Table.getEndRow(), etc.

As long as we provide facilities for doing basic math with these classes, and do not allow clients to convert from a number into either row or column, then we can make the following guarantees:

  1. Clients cannot mix rows or columns in any expression, preventing this class of errors.
  2. Clients cannot accidentally pass a row where a column is expected, or a column where a row is expected.

These are strong safety benefits, but the ugliness of the solution in Java has a detrimental affect on the readability and usability of our API.

The same approach in other languages is often much cleaner. For example, in Haskell, we can define the row and column types as follows:

    newtype Row = Row Int deriving (Eq,Show,Num)
    newtype Col = Col Int deriving (Eq,Show,Num)

Using these types is no different than using ordinary integers, and there is no runtime performance difference. So we get all the safety with none of the negative effects on readability and usability.

In Haskell, it's a clear win. In Java, the best approach is less clear.

Readers are encouraged to submit solutions in other languages.

Better Scene Graphs through Typing

In a scene graph API, nodes have positions in the scene. They have a position relative to their immediate container (the parent), and a position relative to the topmost container (the root). For example, in a window toolkit, a scroll bar has a position relative to its pane, but it also has a position relative to the application window itself.

Points relative to the immediate container are called relative positions, while points relative to the topmost container are called absolute positions.

Nearly all scene graph APIs use a single abstraction for Point, and they rely on clients to keep track of whether a point is relative or absolute. This leads to many usage errors, as clients end up specifying relative points where absolute points should be specified, or combine relative points with absolute points.

By now, the solution to this problem should be clear: introduce separate types for absolute and relative points.

For example, we could have PointAbs and PointRel classes, and Node could expose methods like these:

    public PointAbs getAbsolutePosition() { ... }
    public PointRel getRelativePosition() { ... }

Then any method that called for a point would use one or the other class. In this way, it would be completely impossible to pass a relative point to a method expecting an absolute point (or visa versa).

Add methods for manipulating points to produce other points, and you give clients the ability to stay completely within the point abstractions, which adds an additional measure of safety.

Reinventing the Socket

All of the preceding examples were based on Joel Spolsky's article. For the final example, I'm going to quickly look at an existing API that ships with Java. This API, like so many standard libraries in so many different languages, has a poor design. Most relevantly to this blog post, this Java API is easy to misuse.

The API in question is java.net.Socket. Socket interfaces tend to be poorly designed by ancestry — Berkeley sockets established a pattern that was subsequently copied a thousand times over, even in languages with sophisticated type systems like Haskell. Java's incarnation of sockets is no different.

The problems with java.net.Socket are too numerous to list here, so I'll only mention a few of them we can easily solve by using the type system:

  1. Clients need to bind the socket to a local address, then connect the socket to another address, and finally begin transmitting and receiving information over the socket. Any attempt to perform these operations out of order will result in a runtime error. Clients need to study documentation to determine the order in which these methods should be invoked.
  2. The API allows clients to bind a socket to a remote IP address, even though this operation will always fail (a socket can only be bound to a local address).

Let's keep the basic order the same: first, we'll require the client bind the socket, then connect to a remote address, and finally we'll allow the client to transmit and receive information. But rather than relying on clients' memory to ensure our rules are followed correctly, we'll recruit the type system.

First, let's solve (2) by introducing three abstractions: Address, which represents a local or remote address; LocalAddress, which represents a local address; and RemoteAddress, which represents a remote address.

We'll use Java generics to extract every ounce of typing we can:

    public class Address<T extends Address<T>> {
        public URI getURI() { ... }
        ...
    }

    public final class LocalAddress extends Address<LocalAddress> {
        ...
    }

    public final class RemoteAddress extends Address<RemoteAddress> {
        ...
    }

Now our API can use LocalAddress where it requires a local address, RemoteAddress where it requires a remote address, and Address where any address will do. Somewhere, we'll have a method that lists all the local addresses available to the application.

To solve (1), we'll define our Socket abstraction in this way:

    public interface Socket { 
        SocketBound bind(LocalAddress address, int port);

        SocketBound bind(int port);
    }

Notice this interface has only two methods — one of them a convenience method for the other method. Why call this thing a socket? Because "socket" is the class a client will look for when a client wants to do socket programming. So even if the above can hardly be called a socket, a socket is exactly what we will call it.

The bind() methods return a SocketBound, which is a socket that's been bound to a local address. This interface will be simple, too:

    public interface SocketBound {
        SocketConnected connect(Address<?> address, int port);
    }

Thus, the connect() method accepts any address (because you can connect to a local or remote address), and returns a SocketConnected instance, which represents a connected socket and includes methods for sending and receiving data (a real design would declare exceptions for bind() and connect()).

Look at what we've achieved with this solution:

  1. Clients cannot bind a remote address.
  2. Clients cannot connect a socket without binding a socket to a local address.
  3. Clients cannot transmit and receive without connecting a socket to an address.
  4. Client code knows if a socket is unbound, bound, or connected merely by the type passed to that code, eliminating duplicate binding and duplicate connection errors.

In short, we've made our API difficult to misuse. We've converted rules stored in documentation into rules enforced by the type system. The API is now "braindead" and a client can use it properly by using the intellisense feature of his or her IDE. And our API is still lightweight and pleasant to use for the common case (bind & connect immediately). It's no more difficult to write:

SocketConnected s = socket.bind(localAddr, 1000).connect(remoteAddr, 70);

than to write:

s.bind(localAddr, 1000);
s.connect(remoteAddr, 70);

Dynamic Languages

These techniques work best in languages with static type systems. However, with some effort, they can still benefit dynamic languages.

The major difference is that with dynamic languages, all errors will be detected at runtime instead of compile time. But at least they will be detected. For example, if a client tries to pass an absolute point to a method expecting a relative point, then the error will be spotted as soon as the code is tested. (However, it's also true that some of the examples shown in this post would already result in runtime errors, and in such cases, there is no additional benefit to these techniques for dynamic languages.)

To use these techniques in dynamic languages, you can either do runtime type checking (in dynamic languages that support it), or simply keep the internals of objects sufficiently different that errors will result if a client tries to use the wrong type.

Looking Forward

In this post, we've seen how the type system can be used to make misuse of your APIs a lot harder (and in some cases, impossible). APIs that are hard to misuse are easy to use, because they don't require thought.

Thought is the enemy of good API design.

In the next post, I'll talk about techniques for keeping an API manageable and digestible, while still retaining power and depth.

For readers interested in an exercise, I suggest taking a class in some standard library and showing how you can use the type system to make the API harder to misuse.

See you in Part 4!

Wednesday
May062009

Good API Design: Part 2

In Part 1 of Good API Design, I talked about the principles that inspire good APIs, and I gave a rule of thumb for estimating how much effort you should spend making life easier for the users of your APIs. I concluded with a simple exercise for readers, which was to submit either a good or bad API for a simple configuration manager for a desktop application.

In this second part, I'd like to use that exercise as a concrete example of what good API design is all about.

There are many techniques for good API design, the most important of which I intend to cover later in this series, but this particular post is more about the big picture.

You'll see some techniques, but more importantly, you'll contrast good and bad, and cultivate your intuition for good API design.

The Pain

Good APIs are formed from an intimate knowledge of the problem that clients are trying to solve (otherwise known as, "The Pain"). Note I did not say an intimate knowledge of what the clients want. No one wanted Ruby on Rails. Instead, Ruby on Rails was an answer to the problem of how to quickly develop database-driven websites. (Clients think they know what they want, but in reality, they have expert knowledge only in the problems they face — not in their potential solutions.)

In the case of the exercise given in Part 1, the problem is persistence of configuration data. There are many sophisticated ways to solve this problem, but the exercise limited the scope of the problem to storing and retrieving key/value pairs from a text file.

Let's think about the problem for a second. In most applications of moderate size, there will be dozens or hundreds of little pieces of data that need to be persisted. For example:

  • The last five opened files.
  • The position of the main window.
  • Whether or not the toolbar is docked.
  • The last directory that was used to import a photo from.

From these observations, we can conclude the following:

  • The client needs to persist different types of data. In the exercise, string and numeric data were mentioned.
  • Persistence is a cross-cutting concern — that is, the client may need to persist data from anywhere in the application.

If you're now thinking of annotations, aspect-oriented programming, and other sophisticated ways to solve this problem in a painless fashion for the client — well, you have a future in API design. But to keep the problem small, the exercise told us to stick with an API for manually persisting key/value pairs. So that's what we'll do.

What else can we say about persistence? For one, the sheer number of settings that must be persisted tells us something:

  • The persistence API needs to be featherweight. Any bulk, any boilerplate, will make usage tedious and error-prone.

This is good stuff, but we're not done yet. Consider the nature of configuration data. It's stuff that is nice to persist, but which isn't strictly necessary for functioning. That is, if for some reason the text file containing the configuration data were deleted, yes, that would be unfortunate, but it's not a showstopper. The application should continue to run and it should just use reasonable defaults.

I'll summarize these observations as follows:

  • Persistence is not mission-critical — we don't need the reliability of a database here.
  • Normal execution is essential, even in the face of a missing or corrupted configuration file.

There's more to say, but with the above 5 points, we've captured the heart of the problem. Armed with this knowledge, we can take a look at contenders for the title of Good API Design.

The Bad Config API

In my last post, I submitted an answer to the exercise, in the form of a bad API design. Several readers pointed out some of the reasons why this API was bad, but I'd like to go into depth here.

Recall the proposed design went like this:

public class Config {
    public void init(File file) { ... }

    public void flush() throws IOException { ... }

    public void close() throws IOException { ... }

    public void reopen() throws IOException { ... }

    public void put(String key, String value) { ... }

    public String get(String key) { ... }
}

This isn't the worst API design in the world, as pointed out by one reader. But it does suffer from a number of issues.

Let's first analyze the API by seeing how it stands up to the principles of good API design covered in the first post.

  1. Does the API utilize best practices in software development? Here, the API is saved by its tiny size.
  2. Is the API self-consistent? Again, the API is so small it's hard to be inconsistent, though I would point out the init() method is the only method whose name is truncated (a minor quibble).
  3. Does the API operate at a high level? Here, the answer is a resounding No. The API requires me, as a client, to be in charge of maintaining the persistence machinery — opening, flushing, closing, reopening, and so forth. If I need to retrieve a number, then I have to parse the string myself (the same for other data types). If I want default values for configuration settings not in the text file, then I have to check for null and supply the default value myself. Finally, I have to wrap every call in try/catch and deal with failures somehow.
  4. Does the API take advantage of common knowledge? The terminology is pretty standard, with a couple exceptions: init() and reopen(). Map-like structures in Java often use get() and put().
  5. Does the API have one way, or at least few ways, of doing the same thing? Check on that one.
  6. Is the API hard to misuse? The API gets an F in this category. Not only is the API hard to misuse, but there is a 100% certainty of misuse. First, notice that init() does not throw an IOException. Does that mean we have to call reopen() after we call init()? Probably so (unless the IOException is silently caught in the init() method, making the API inconsistent). Second, do we have to call flush() before calling close()? It's not clear, so we would probably do it anyway just to be sure. Third, the whole process of (re)opening, flushing, and closing, is so tedious it's bound to lead to errors. Likely, we'll call put() for some configuration data, and forget to call flush() or close(), so data will be lost. Or we'll call close() somewhere, and forget to call reopen(), leading to an IOException. More than one thread may create a Config object for the same file, leading to data corruption. The possibilities for misuse are endless — and the odds so high, that misuse will happen, sooner or later.
  7. Does the API need documentation? Certainly, because the relationship between init() and reopen() is unclear, and the semantics of the class are complicated enough that clients need to be told exactly what's expected of them.

Overall, the picture's rather ugly. This is a bad API, of that there is no doubt. So let's use our knowledge of The Pain, and the mistakes made in this API, to construct something better.

The Good Config API

How can we create a good API here? Let's start by jotting down some goals, which are inspired by the preceding sections.

Our API should:

  1. Save the client from having to do any low-level file management. Thrusting file management on the client adds complexity and makes the API easy to misuse.
  2. Provide direct support for different types of data, so the client doesn't have to worry about converting strings.
  3. Provide direct support for defaults, in case some data isn't available, is corrupted, or has the wrong type.
  4. Reduce storage of a setting down to one line of code, and retrieval of a setting down to one line of code.

With these goals in mind, an interface comes to my mind immediately:

public class Config {
   String getString(String key, String defaultValue) { ... }
   void   setString(String key, String value) { ... }

   void setInt(String key, int value) { ... }
   int  getInt(String key, int defaultValue) { ... }

   float getFloat(String key, float defaultValue) { ... }
   void  setFloat(String key, float value) { ... }
}

I didn't write the signatures for any constructors yet — that's quite intentional. I'll return to that topic shortly.

Notice a few things about this API:

  1. The setX() methods are mirrored by getX() methods. To some extent, I think, the Java world gave birth to the getter/setter paradigm, and Java developers are so acclimated to it, they'll look for a setter method before they'll look for a "putter" method, whether or not the underlying structure is map-like. (This is a small change, though, and making these "putter" methods would likely not cause much delay for clients.)
  2. The naming scheme is strongly consistent. For each getX method, there is a corresponding setX method. Strings don't receive special treatment. Although it doesn't make a lot of difference for this size API, the larger the API, the more important consistency is.
  3. Every getter method forces the client to specify a default value. The default value is used in many situations: if the file doesn't exist, if the file is corrupt, or if the value has the wrong type. In making this decision, we save the client from having to check for the existence and type of settings (if a client really wants to check for existence, she can always pass some magic value as the default). Yes, the client does need to specify a default, but that little bit of work buys them a lot of functionality, and guarantees their application will be well-behaved even if the configuration file doesn't exist or is corrupted.

Equally important, notice what's missing from this API:

  1. No method throws an exception. Since we force the client to provide a default value, we know exactly what to do if there's a problem reading data or converting it into the appropriate type: we just return the default value. The API thus become safe, and we bear the burden of error checking and data validation, so the client doesn't have to.
  2. There are no file management methods. Internally, data must be read from and written to a file, but we don't need to expose that implementation detail to the client. Again, we bear the burden so the client doesn't have to.
  3. There is no support for a hierarchy of configuration data. We've kept the API deliberately simple. If clients want a hierarchy, they can still get it using hierarchical keys (e.g. "file.recents").

Now I'm sure some readers are thinking, "We must expose file management operations to the client in order to achieve high-performance!" Alas, many evils are committed in the name of "high-performance", and while it's true that we can't afford to persist all data every time a single setting is changed (that would become a performance bottleneck!), it's also true that we don't have to. There are at least three ways to hide file management from the client: by journaling, by saving the file on shutdown, or by saving the file on timeout from the last modification (or some combination of all three).

None of these options are easy for us, the API developers. But that's our job: we do the hard stuff to make things easy for our clients. If we have to write 100 lines of code so clients can save a single line, then that's exactly what we'll do (as long as we have at least 100 clients).

The final detail we need to address is how a client obtains a Config instance. This is the last remaining part of the API puzzle. And here we have a choice: we're close to eliminating the file system from the API. We can take the plunge, and move to a higher level of abstraction, or we can accept the fact that the configuration data is stored in a file and require clients to acknowledge that fact when they create the Config object.

To help us decide, I'll introduce you to the Sweet Spot Saturation Rule.

The Sweet Spot Saturation Rule

The Sweet Spot Saturation Rule tells us that you can satisfy 90% of your users with 10% of the features that would be required to satisfy 100% of your users. That's the sweet spot of saturation. Fewer features means less complexity, which creates the potential for higher usability.

Apple Computer knows this rule well. Applications like Safari and Mail don't have many features. They have a few. But they satisfy the needs of most people. By letting go of the power users, they achieve high usability (this is not always an inevitable tradeoff, but at the very least, simpler applications are easier to make usable than more complex applications).

When developing APIs, you're going to need to make the same call yourself. You can aim for maximum flexibility, but it's going to make your API larger, harder to use and harder to understand. Or you can aim for that 10% of functionality which gives you 90% of the market. (Of course, you could try for something in the middle, as well.)

In the case of the configuration API, I'm going for the sweet spot. We've already eliminated file management operations from the API, so a natural next step is to remove the concept of the file system from the API. This means the user will have no choice where to store the file.

This decision buys us a lot:

  1. We make it harder to misuse the API. If the client doesn't tell us where to store the file, then the client cannot accidentally specify a file that exists in a read-only directory; the client cannot specify a file located in a non-existent directory; etc.
  2. We can accept the burden of determining a good location to store preferences. In general, the best location to store configuration data will vary by operating system. We can save clients from needing to know about such details, while still respecting the conventions of the host operating system.
  3. We can choose to provide thread-safety with no extra work for the client. Since only we know the location of the configuration file, we can make sure access to it is synchronized, or always goes through a single gatekeeper.

Make no mistake about it: this is a choice that costs us some things but buys us others. It's the choice I would make, but I'd also listen closely for feedback to make sure it was the best one for my audience (API designers are not prophets).

With this approach, our completed API might look something like this:

public class Config {
   private Config(...) { ... }

   public static Config forApp(String appName) { ... }

   String getString(String key, String defaultValue) { ... }
   void   setString(String key, String value) { ... }

   int  getInt(String key, int defaultValue) { ... }
   void setInt(String key, int value) { ... }

   float getFloat(String key, float defaultValue) { ... }
   void  setFloat(String key, float value) { ... }
}

With this API, there's a single way to create a Config instance: by using the static factory method forApp(), which accepts the name of the application. The application name is an arbitrary string, and we'll take care to convert it into a file name and location that are appropriate for the host operating system.

The client can invoke the static factory method one time or a hundred. We'll ensure that everything "just works", as if the client had created a single instance. Moreover, we'll allow the client to access Config instances from any number of threads.

Now that our API is finished, let's take a look at all that we've done, and how our API can be used by developers of differing skill levels.

Evaluating the Good Config API

The API we've created is damn simple to use. You just set and get stuff. All the messy details of persistence are handled for you.

Moreover, the API is very nearly impossible to misuse. As a result, the need for documentation is low, limited primarily to the act of creating a Config instance. The static factory method deviates from the usual constructor pattern sufficiently to cause some confusion, and clients will be wondering things like, "What is an application name? What kinds of strings can I pass to the method without it blowing up?"

The API can be used either by advanced or beginning developers.

An advanced developer would likely use our API in the following fashion:

public class MyConfig {
   private Config config = Config.forApp("My Application").

   private MyConfig() { }
   public static MyConfig INSTANCE = new MyConfig();

   public int getNumRecentFiles() { return config.getInt("NUM_RECENT_FILES"); }
   ...
}

In this way, the entire application has access to a singleton object that localizes and encapsulates all configuration data (singletons aren't always bad).

A beginning developer, on the other hand, would likely call Config.forApp() many times, scattered throughout the application. Our API supports both usage models.

How does our API stack up on other grounds? With a little reflection, you should be able to see that our API follows best practices, is internally consistent, is high-level, follows conventions familiar to Java developers, has few ways of accomplishing the same task, is very hard to misuse, and doesn't need much documentation. The API also meets all of our design goals.

In short, we've designed a good API.

Onward Ho

By now, you should be developing a feel for the differences between good APIs and bad APIs. That gut feeling, which comes from experience comparing many different APIs, will serve you well as you design your next API. Alone, however, it is not enough.

Beginning with Part 3 of this series, I'm going to dive headfirst into concrete techniques you can use to develop good APIs.

The first technique I'll look at helps you write APIs that are hard to misuse, by taking advantage of static typing (developers for dynamic languages may want to skip Part 3).

In the meantime, I'd like to ask readers to pick one of the following exercises:

  1. Create a working implementation of the good config API, in Java or any other language of your choice, and submit it to this blog post.
  2. I mentioned that hiding the file system from clients cost us something. What are some of the drawbacks of hiding this information, and what kinds of clients would this affect?
  3. Extend the above API for dealing with other primitives, arrays, and serializable objects.

Reader Feedback

I was surprised by the number of responses to my last blog post in this series, so I'd like to address the highlights here:

  1. Mason brought up a Windows API. There are some similarities to this API and the good one developed in this post. However, the Windows API is much-lower level and forces clients to do a lot of grunt work.
  2. Timur was the first to recognize the need for typing. His API provides getter methods for different types of data, an essential feature of an easy-to-use, high-level API for configuration data management. Timur also recognized the hassle with exceptions.
  3. Rogerio brought up the Java Preferences API, an API with which I was not familiar. As Rogerio mentioned, this is a well-thought out API that resembles the good API we developed. However, in my opinion, the API is too heavyweight, tries for too much functionality, and still places the burden of persistence on the user. Still, it's probably a good fit for the 10% of the market whose needs are not met by our API.
  4. mystilleef suggested drammatic simplification, and that's exactly the approach we took. The main differences between mystilleef's API and our own are that mystilleef chose to expose the file that contains the configuration data, and he sketched the API in a dynamic language, so he was able to cut down on the number of methods.
  5. FreakWithBrain correctly identified some issues in the bad API that I submitted. The file management operations, as he noted, are just asking for problems.
  6. Dirk did not submit an API, but noted that clients don't need or care about file management operations. He suggested they be removed entirely, and that's exactly what we did!

Thanks to all readers for submitting their own perspectives.

See you in the next post!

Continue to Part 3