|IBM home | Products & services | Support & downloads | My account|
|The art of writing Linux utilities|
Linux is famous for coming with a large toolbox and good ways to integrate tools. Peter Seebach discusses how new tools are developed and how to make a one-off program into a utility you'll be using for years to come.
Linux and other UNIX-like systems have always come with a broad variety of tools that perform functions ranging from the obvious to the arcane. The success of UNIX-like programming environments comes largely from the quality and selection of tools, and the ease with which they can be joined together.
As a developer, you may have found that existing utilities don't always solve your problem. While you can solve many problems easily by stringing together existing utilities, solving other problems requires at least some amount of real programming. These latter tasks are often candidates for creating a new utility that, when combined with existing utilities, will solve the problem with a minimum of effort. This article looks at the qualities that make for a good utility and the design process that goes into it.
What makes a good utility?
Utilities are supposed to let you build one-off applications cheaply and easily from the materials at hand. A lot of people think of them as being like tools in a toolbox. The goal is not to have a single widget that does everything, but to have a handful of tools, each of which does one thing as well as possible.
Some utilities are reasonably useful on their own, whereas others imply cooperation in pipelines of utilities. Examples of the former include
Designing a utility
Here are some design goals of utilities; each gets its own section, below.
Do one thing well
Imagine how frustrating it would be if most programs sorted data, but some supported only lexographic sorts, while others supported only numeric sorts, and a few even supported selection of keys rather than sorting by whole lines. It would be annoying at best.
When you find a problem to solve, try to break the problem up into parts, and don't duplicate the parts for which utilities already exist. The more you can focus on a tool that lets you work with existing tools, the better the chances that your utility will stay useful.
You may need to write more than one program. The best way to solve a specialized task is often to write one or two utilities and a bit of glue to tie them together, rather than writing a single program to solve the whole thing. It's fine to use a 20-line shell script to tie your new utility together with existing tools. If you try to solve the whole problem at once, the first change that comes along might require you to rethink everything.
I have occasionally needed to produce two-column or three-column output from a database. It is generally more efficient to write a program to build the output in a single column and then glue it to a program that puts things in columns. The shell script that combines these two utilities is itself a throwaway; the separate utilities have outlived it.
Some utilities serve very specialized needs. If the output of
The script in Listing 1 does exactly one thing. It takes no options, because it needs no options; it only cares about the length of lines. Thanks to Perl's convenient
Be a filter
Remember that a utility needs to work on the command line and in scripts. Sometimes, the ideal behavior varies a little. For instance, most versions of
Utilities like to live in pipelines. A pipeline lets a utility focus on doing its job, and nothing else. To live in a pipeline, a utility needs to read data from standard input and write data to standard output. If you want to deal with records, it's best if you can make each line be a "record." Existing programs such as
One utility I occasionally use is a program that calls other programs iteratively over a tree of files. This makes very good use of the standard UNIX utility filter model, but it only works with utilities that read input and write output; you can't use it with utilities that operate in place, or take input and output file names.
Most programs that can run from standard input can also reasonably be run on a single file, or possibly on a group of files. Note that this arguably violates the rule against duplicating effort; obviously, this could be managed by feeding
Some programs may legitimately read records in one format but produce something entirely different. An example would be a utility to put material into columnar form. Such a utility might equate lines to records on input, but produce multiple records per line on output.
Not every utility fits entirely into this model. For instance,
Generalizing functionality sometimes leads to the discovery that what seemed like a single utility is really two utilities used in concert. That's fine. Two well-defined utilities can be easier to write than one ugly or complicated one.
Doing one thing well doesn't mean doing exactly one thing. It means handling a consistent but useful problem space. Lots of people use
This rule, and the rule to do one thing, are both corollaries of an underlying principle: avoid duplication of code whenever possible. If you write a half-dozen programs, each of which sorts lines, you can end up having to fix similar bugs half a dozen times instead of having one better-maintained
This is the part of writing a utility that adds the most work to the process of getting it completed. You may not have time to generalize something fully at first, but it pays off when you get to keep using the utility.
Sometimes, it's very useful to add related functionality to a program, even when it's not quite the same task. For instance, a program to pretty-print raw binary data might be more useful if, when run on a terminal device, it threw the terminal into raw mode. This makes it a lot easier to test questions involving keymaps, new keyboards, and the like. Not sure why you're getting tildes when you hit the delete key? This is an easy way to find out what's really getting sent. It's not exactly the same task, but it's similar enough to be a likely addition.
Try to make sure you've figured out what data your utility can possibly run on. Don't just ignore the possibility of data you can't handle. Check for it and diagnose it. The more specific your error messages, the more helpful you are being to your users. Try to give the user enough information to know what happened and how to fix it. When processing data files, try to identify exactly what the malformed data was. When trying to parse a number, don't just give up; tell the user what you got, and if possible, what line of the input stream the data was on.
As a good example, consider the difference between two implementations of
Security holes are often rooted in a program that isn't robust in the face of unexpected data. Keep in mind that a good utility might find its way into a shell script run as root. A buffer overflow in a program such as
The better a program deals with unexpected data, the more likely it is to adapt well to varied circumstances. Often, trying to make a program more robust leads to a better understanding of its role, and better generalizations of it.
If you're about to start writing a utility, take a bit of time to browse around a few systems to see if there might be one already. Don't be afraid to steal Linux utilities for use on BSD, or BSD utilities for use on Linux; one of the joys of utility code is that almost all utilities are quite portable.
Don't forget to look at the possibility of combining existing applications to make a utility. It is possible, in theory, that you'll find stringing existing programs together is not fast enough, but it's very rare that writing a new utility is faster than waiting for a slightly slow pipeline.
An example utility
This program does one thing only. It prints out errno lines from /usr/include/sys/errno.h in a slightly pretty-printed format. For instance:
Listing 2. Errno finder
Does it generalize? Yes, nicely. It supports both numeric and symbolic names. On the other hand, it doesn't know about other files, such as /usr/include/sys/signal.h, that are likely in the same format. It could easily be extended to do that, but for a convenience utility like this, it's easier to just make a copy called "signal" that reads signal.h, and uses "SIG*" as the pattern to match a name.
This is just a tad more convenient than using
Another example might be a program to unsort input (see Resources for a link to this utility). This is simple enough; read in input files, store them in some way, then generate a random order in which to print out the lines. This is a utility of nearly infinite applications. It's also a lot easier to write than a sorting program; for instance, you don't need to specify which keys you're not sorting on, or whether you want things in a random order alphabetically, lexicographically, or numerically. The tricky part comes in reading in potentially very long lines. In fact, the provided version cheats; it assumes there will be no null bytes in the lines it reads. It's a lot harder to get that right, and I was lazy when I wrote it.
Don't design the utility the first time you need it. Wait until you have some experience. Feel free to write a prototype or two; a good utility is sufficiently better than a bad utility to justify a bit of time and effort on researching it. Don't feel bad if what you thought would be a great utility ends up gathering dust after you wrote it. If you find yourself frustrated by your new program's shortcomings, you just had another prototyping phase. If it turns out to be useless, well, that happens sometimes.
The thing you're looking for is a program that finds general application outside your initial usage patterns. I wrote
One good utility can pay back the time you spent on all the near misses. The next thing to do is make it available for others, so they can experiment. Make your failed attempts available, too; other people may have a use for a utility you didn't need. More importantly, your failed utility may be someone else's prototype, and lead to a wonderful utility program for everyone.