Safeco Field, home of the Seattle Mariners. Safeco Field, home of the Seattle Mariners.

Seattle, WA
July 3, 2005
Apple | Cool | Disney | Entertainment | Fitness | Geek | Microsoft | Politics | Seattle Storm | Transit | Travel | UW MBA

« Houston Comets, RIP | Main | California Adventure: The Interactive Map »

December 10, 2008

Some Simple Suggestions For Saner Spreadsheets

This is cross-posted from the Mac Mojo blog.

If you're like me, and spend a lot of time building spreadsheets to model or budget or crunch a set of numbers, you have probably had the following experience.

One day, you have a bright idea. You fire up Excel and begin working to express it. After hours and hours of constructing and tweaking and changing and modifying your spreadsheet, you find yourself sitting at your desk, brow sweaty, with a goofy look of happiness on your face - "Aha!" you think. "That's it! I've done it!"

You then save your work, quit Excel, and head off for a much-deserved snack.

Three months later, you remember your genius spreadsheet and want to update it. You locate the file on the drive, double-click, and ... find yourself staring at gobbledygook. The spreadsheet is, well, unreadable - you can't tell which cells do which, what math is being used where, and quickly find yourself starring in an episode of CSI: Excel, relying on clues, fingerprints and guesswork to determine what the heck is actually going on in your workbook.

This "unintelligible workbook" situation is not uncommon - and not even that surprising, if you think about it. Unlike the art of writing, where there are defined conventions (sentence structure, paragraph structure, document structure, story structure), spreadsheets are as varied as the minds that create them. On one hand, this is great - spreadsheets are fantastically flexible tools - but on the other it can lead to confusion, error, and misunderstanding. Without standards for what a spreadsheet "should" look like, the reader of a spreadsheet is often left scratching their head, wondering just what it is they're staring at.

(This by the way, is doubly true in large organizations, where some long-departed employee once built some hairball of a spreadsheet back at the dawn of the last Ice Age, and now your boss wants you to "update it a little bit". I've been there, and, well, good luck with that.)

Enter the concept of spreadsheet "best practices".

As spreadsheets become more integral to our work (and, consequently, more complex in their design and function), a clear need has emerged to have a set of principles around which spreadsheets are designed and developed. There are a number of distinct methodologies and approaches, such as FAST ("Flexible, Accurate, Structured, Transparent"), but in my view the important thing is to find a few guidelines that make sense for you (and the people you work with), and then ... just follow 'em.

My first personal experience with best practices came when I was in business school. My Quantitative Methods coursework was done entirely in Excel, and my professor, Dr. Hillier, insisted that people follow some guidelines when laying out their spreadsheets so he could understand what the students were doing. His guidelines were simple, straightforward, and made a lot of sense; years later, I find myself still using them and saving my sanity. (Dr. Hillier, incidentally, credits Duke University professor Dr. Robert Nau for inspiring him.)

Of the tips I've seen for sane spreadsheet design, the most valuable have been:

  • Each cell does one thing, and one thing only. Any given cell on the spreadsheet should be responsible for doing exactly one thing, whether it holds data, calculates something, or merely provides visual space between other cells.
  • Use intermediate steps. If a cell is doing one thing and one thing only, then you'll need to use intermediate steps to complete more elaborate calculations. That's OK - subtotals are your friend.
  • Separate data from formulas. Formulas should always reference other cells on the worksheet for their input; they should never have hard numbers embedded in the formula itself.
  • Use labels. Never let a number sit on a spreadsheet without a text label next to it that explains what it is or what it does. These labels will boost the "readability" of your spreadsheet, and help you spot errors.
  • Use color. Use splashes of color to color-code certain cells on the spreadsheet. There's no hard-and-fast rule, here; I personally use blue for cells that contain facts that affect my model (think: tax rates, discounts or fees), yellow for cells that are considered inputs to my model, and orange for just one cell - the ultimate conclusion, or "answer" cell.
  • Remember the "rule of thumb". As a rule, no formula should exceed about a thumb's worth of horizontal space on the spreadsheet. If it takes more, break it up. This will act as a forcing function to ensure you're using intermediate steps properly.
  • Flow from top to bottom. Spreadsheets should read like the page of a book - start at the top and let the eye trace down the page to get more detail. The conclusion - the ending to the book - should be at the very bottom of the page. Your "reader" will know where to find it, and, if you've observed the other rules correctly, the build-up to your conclusion should be obvious and logical.

Now that we're armed with all these rules, I'll give a simple example of how we might use them.

Say your boss wants to throw a pizza party for the office and asks you to figure out what it will cost beforehand. There are 30 people in the building, so you fire up Excel and quickly build a model. It might look something like this:

Figure 1
Figure 1 - Pizza For 30 ... But How?
The numbers look good, but there's one glaring problem with this spreadsheet: nobody knows how you got your total. Yes, there's 30 people listed ... but why does that come to $215.10?

Let's go ahead and expand these cells to reveal their internal formulas (Excel has a nifty keyboard shortcut for doing this: hold down the CONTROL key and then tap the "tilde" key (~)).

Doing so will yield something like this:

Figure 2
Figure 2 - Pizza For 30 ... The Mystery Revealed

Wow, that's some crazy formula there in B4, huh? Let's see, I'm seeing the CEILING function, some addition and division and multiplication, a few numbers I don't know the origin of ... spaghetti, basically. There's a nearly 0% chance that anyone who didn't create this spreadsheet - create it recently - would be able to figure out what any of this means.

Let's try this same example, but following some of the tips from above. We might wind up with something that looks more like Figure 3 (Figure 4 shows the underlying math):

Figure 3
Figure 3 - Pizza for 30, Take 2
Figure 3
Figure 4 - Pizza For 30, Expanded View

Going back to our tips, a few things jump out:

  • Each cell is doing one thing only. Some cells, like "Number of Attendees" (C4), contain just the number of attendees. Others, like "Sales tax" (C10) contain just one math function. Everything is nice and simple.
  • We are using intermediate steps. We have a "pizza subtotal" at C9, and a full-blown total at C12. Both help us to see what steps we're taking as we proceed through the model.
  • Data is separated from formulas. Every variable we're using - sales tax, the amount to tip, the number of people attending, the number of slices of pizza we think they'll eat, and so on - has its own cell. The formula cells crunch those numbers, but they rely on the visible information on the spreadsheet to do their work.
  • Labels are present. Virtually every number on the spreadsheet has a label next to it that explains what it is. This dramatically improves the readability of the spreadsheet.
  • Color is being used. As expected, we are using blue, yellow and orange. Blue cells are things that are assumed to be true about the model - the pizza place charges $15 per pizza, for example, or we get 8 slices per pizza (we are ordering mediums). Yellow affects those things that are under our control - like how much we tip, or how many people are coming. And orange is used for the grand total.
  • We're following the "rule of thumb". The longest formula on the sheet (on cell C7) is no wider than my thumb, which means it's much easier for someone who's never heard of the CEILING function to understand what that cell is doing. (Incidentally, CEILING rounds up for us; on its own, the math says we'd need 11.25 pizzas. Since we can't order a quarter-pizza, we instead round up to the next whole pie.)
  • We flow from top to bottom. We start with our assumptions up top, and then flow down through the spreadsheet until we hit our conclusion.

So that's it - some simple suggestions for saner spreadsheets. This might seem like a lot of work at first - it did to me, for sure - but, as a loyal and regular practitioner of these techniques for the past few years, I can only attest to my own happiness and productivity after having adopted them. Your mileage may vary, but the next time you open a spreadsheet and go, "What was I thinking?" you might do well to adopt a few of these habits.

If you're interested, my spreadsheet can be downloaded here.

Posted by Gavin Shearer at December 10, 2008 10:11 AM. Posted to Apple | MSFT.

« Houston Comets, RIP | Main | California Adventure: The Interactive Map »