Tuesday, February 15, 2011

Command Line Processing

When running programs from the command line, they are able to take a list of string arguments (the String[] args in Java's main method, for example) to control how they work.  The only way to tell these arguments apart, however, is to look at their position in the list.  This works for some programs, but breaks down quickly for complex tasks.  Easy examples can be found in any standard command line programs.  How would you use a command like grep to search through files and then display some context around the matches.  Putting "context" as an argument wouldn't work, how would it know if "context" is a special parameter or a filename?

So the notion of options were invented.  The idea being, a special flag (traditionally - on Unix, / on Windows) is placed at the begining of an argument, and the remainder of the argument has special meaning.  -C in grep is used specially to indicate context.  Importantly, whatever argument follows -C is also not considered an argument, but the parameter to the context option (in this case, a number of lines to display is expected).

This fairly simple syntax is all that the vast majority of command line programs need to function.  It's all Mercurial needs, and it's presumably all Abundant will need also.  But there is neither any one standard way to actually structure this interface (some allow + or other symbols as flags, some allow options that optionally take arguments, some use -FLAG=ARG rather than -FLAG ARG, etc.) nor any straight forward way to implement any of these designs.  Python alone has three different libraries that all do the same thing, and they don't even provide the full range of features, so some projects still wouldn't be able to use them.

And so far we haven't even addressed how these arguments and options actually hook into executing code.  The available libraries usually work fairly well for "light" programs which do exactly one task (admittedly, this is the model Unix is built on, so this isn't completely unreasonable) such as cp, grep, or ping, however they break down very quickly with larger programs like Mercurial or Abundant, where depending on which operation you wish to do different options should or should not be allowed.  This seemingly reasonable functionality is all but completely missing from most argument processing libraries, and as such, needs to be built manually.

Mercurial built it's processor on top of optget, I am using the more powerful optparse for Abundant.  Besides the underlying library, the code design is fairly similar.  A function is defined for each atomic operation that can be run in Abundant.  Then a table (dictionary, hash map, whatever you want to call it) is constructed consisting of the name of the task, like init, new or edit, which maps to a data structure containing the correlating function, a list of options the command will take, and a string describing the expected structure of the command.  The program parses the input arguments, looks up the first parameter against the table, and if it finds a match, passes the remaining arguments to a parser library given the option structure described in the table, the result of parsing the input into options and arguments is then finally passed to the function to do its work.

It's a tedious and error prone process.  Error catching occurs at three levels, before the parser, in the parser, and in the function.  Many of these error checks need to be redundant between levels, and the end result is a very unpleasant piece of spaghetti code.  Shockingly, however, it's better than most of the alternatives out there that I have seen.  It scales easily (adding new commands and editing old ones has little to no effect on other commands), allows for unique options per task, and the end result will be quite nice, as Mercurial's UI demonstrates.  Getting there is a fairly painful process sadly.

No comments:

Post a Comment