Software Design Philosophy

This page contains John Ousterhout's recommendations for how to design clean, simple, and obvious software. As of August, 2012 it is still a work in progress.

Design classes around information, not algorithms or time order. Each class should encapsulate and own particular design decisions or information, so that code outside the class does not need to be aware of them. The biggest mistake that people make is to design classes around time: collect all the things that happen at one point in time into a single class. Read David Parnas' seminal paper On the Criteria to be Used in Decomposing Systems into Modules, which explains why this is a bad idea, and why "information hiding" is a better approach. Unfortunately, most people who read Parnas' paper say "d'oh, that's totally obvious and I learned that in my introductory programming class!", then go on to write code in exactly the way that Parnas argues against.

Avoid "thin" classes. Ideally, a class should have a very simple interface that hides a lot of functionality and internal complexity; said another way, its ratio of volume/(surface area) should be large. I call such classes "sweet"; in general I use the term "sweet" to refer to software systems that have a high ratio of power/complexity. A thin class is the opposite of this: in the worst case, the interface is just about as complex as the implementation (i.e. I could have rewritten the class in the same time it took me to learn how to use it). The classic example of a thin class is one that implements a linked list. There's no interesting functionality being hidden here, so there's almost no advantage in having such a class; it just adds to the system's complexity.

Every method should do something useful. This sounds obvious, but I often see people design abstractions from the bottom up in lots of thin layers: write a few methods that do some stuff, then write a few more methods that use those methods to do something higher level, and so on. People sometimes argue that this is good because "I've created a lot of building blocks that can be reused in lots of different ways". The result is that a lot of the methods aren't directly useful: a caller has to invoke several of these thin methods in order to get something useful done. The Java I/O library is a great example of this design flaw: you have to instantiate 3 or 4 different classes just to open a buffered stream on a file. Instead, make sure that every method does something that is useful on its own; try to make your methods "thick", so that they do big useful things. Try to design classes top-down and bottom-up at the same time, so that you are thinking about the ultimate functionality you want to provide, rather than just creating a bunch of building blocks. First optimize to make the common case really easy to use; then expose additional methods if you need to provide building blocks for the uncommon cases.

Comments should describe the things that aren't obvious from the code. This implies two things:
- Don't waste time writing comments for code that is already obvious, such as the following silly example:
      // Increment i.
      i++;
  A common mistake people make when writing comments is to use the same words in the comment that are already used in the code, and this reduces the value of the comment. For example, I recently saw the following comment in some RAMCloud code:
      // The epoch when this tablet was deleted.
      uint64_t epoch;
  The problem is, "the epoch" is already obvious from the code. What I need to know is "what is an epoch, and why is it needed?".
- The comments should contain information that is not obvious from the code, such as why a particular variable is needed, or who is likely to call a particular method, or what are the units for variable. Comments are crucial for abstraction: abstractions will not be obvious from the code, so it's essential to describe them with comments (what is the common theme, or overall task for this class? what is the invariant for this variable? etc.).

Define errors out of existence. Error cases are one of the primary sources of complexity in software; they create complexity both for the code that generates the error and for the code that must handle it, and they often result in subtle bugs where programmers forget to check for errors, resulting in runtime crashes when the errors occur. Instead, whenever possible, design code so that there is no error condition. For example, in the Tcl scripting language there is a command unset, which removes a variable. Unfortunately I made the mistake of implementing this command to throw an error if the variable doesn't exist ("why would anyone delete a variable that doesn't exist?"). However, it's fairly common for people to want to get rid of a variable that may or may not already exist; as a result, Tcl applications are littered with code that invokes unset inside a catch clause that ignores errors. In retrospect I should have defined unset to ignore nonexistent variables without complaining. This results in the command having the behavior "make sure this variable no longer exists," which is simple and reasonable.

Programmers often think that it's better to define as many error conditions as possible ("my code is really careful"), but this just creates a lot of complexity. I've even seen cases where methods require additional arguments that serve no purpose in the method except to allow for additional error checking; once the error checking was completed, the arguments are ignored! I would argue the opposite: design code with as few error conditions as possible. Wherever possible, design abstractions so that every possible combination of inputs is meaningful and reasonable: make your abstractions just "do the right thing". Or, said another way, handle as many errors as possible locally, but export as few errors as possible (this reminds me of the classic license plate sticker "think globally, act locally").

Think of classes as martyrs. The best classes are those that bring suffering on themselves to make the world a better place for everyone else. It's okay if a class is complicated internally, as long as it presents a really simple and powerful interface to the rest of the world. Of course, it's even better if you can figure out how to decompose the class internally so it's not so complex. But if complexity is inevitable, better to encapsulate it in one class rather than spread the complexity across a large number of callers. Or, said another way, push complexity down, not up.

When multi-threading, use monitor-style locking whenever possible. The reason for this is that monitor-style locks are pretty simple and predictable: every monitor method acquires a lock at its beginning and releases it at its end. The alternative is to acquire and release locks on a much more granular basis inside methods. This approach may be slightly more efficient, but its irregularity makes it much more error prone. Threading is really hard to get right, so it's best to handle locking using a very simple and consistent paradigm.