13.07.2007 Switch On Strings In C And C++

For Rapicorn i need a simple and nicely readable way to special case code on various strings, which isn’t a problem in most scripting languages but is not provided by C/C++. Switching on strings turns out to be a widely investigated topic: SOS (agay), TheScript, JavaBug (CNFW), CodeGuru, CLC-FAQ and google has lots more.

Now here goes my approach to switching on strings for C/C++, caveats upfront:
- The strings may only consist of identifier chars: [A-Za-z_0-9];
- Convenient application is based on a GNU Preprocessor feature: -imacros;
- Macro implementations use a GCC-ism: Statement Expressions;
- The convenient command line variant uses a bash-ism: Process Substitution;
- One compile-time script is needed, currently implemented in python (though it could be written in any language).

With this out of the way, let’s dive into the code. This is the syntax:

	switch (SOSID_LOOKUP (sample_string))
          {
          case SOSID (hello): printf ("Hello ");   break;
          case SOSID (world): printf ("World! ");  break;
          case 0: default:    printf ("unknown "); break;
          }

This is how the code needs to be compiled: gcc -Wall -O2 -imacros <(sosid.py <input.c) input.c

The actual implementation is fairly simple, sosid.py extracts all identifiers from SOSID() statements and generates a handful of macro definitions: SOSID(), SOSID_LOOKUP(), SOSID_LOOKUP_ICASE(), ... Plus input specific macros: SOSID__hello, SOSID__world, ... GCC will pick up those definitions via the -imacros option. That is enough to implement SOSID_LOOKUP() so that it can yield the integer IDs that the SOSID(*)/SOSID__* statements expand to for any string that corresponds to one of the SOSID(identifiers) occurring in the source file.

Example:
call_switch ("hello"); call_switch ("foo0815"); call_switch ("world");
Yields:
Hello unkown World!

The lookups are implemented via a binary search and the strings are sorted in a way so that binary searches for case sensitive and case insensitive lookups are both possible. That means for large string lists, the lookups which are of complexity O(ld N) actually have a performance advantage over lengthy if..else if..else constructs with O(N) complexity.
The code is available as a single script at the moment: sosid.py.

Note that the bash-ism, imacros-use and GCC-isms are not mandatory, ordinary temporary files can be generated by the script and be included instead and it could be adapted to generate portable inline functions. The identifier-chars restriction is a hard one though. It comes from the fact that SOSID(ident) must yield valid C/C++ integers, and that can only be accomplished by token pasting. Depending on GCC is good enough for Rapicorn, so people will have to express active interest in this, in order for me to make the script more portable.

4 Comments

  1. Posted 14.07.2007 at 12:58 | Permalink

    Sounds like you just reinvented symbols. I guess Greenspun’s Tenth Rule is alive and well.

  2. Florin
    Posted 14.07.2007 at 17:05 | Permalink

    What about gperf (http://www.gnu.org/software/gperf/) ?

  3. Posted 14.07.2007 at 17:23 | Permalink

    > What about gperf (http://www.gnu.org/software/gperf/) ?

    for gperf and why it’s not suitable, see:
    http://shum.huji.ac.il/~agay/sos/theory.html#gperf
    in short, it has unwanted collisions with arbitrary strings and it’d involve a hell of a lot more code then the simple binary lookup SOSID has.

  4. muppet
    Posted 14.07.2007 at 21:46 | Permalink

    I’m very interested here, because the “inefficiency” of lookups based on strings has driven many of my peers to managing number spaces and the resulting dependency and coupling issues that brings.

    Why the statement expression instead of an inline function?

    Why not use a hashing algorithm, instead? Your source generator could scan the strings of interest and make symbols whose values are their hashes. Then, assuming that compute_hash() has exactly the same algorithm as your code generator:

    switch (compute_hash(”foo”)) {
    case STRING_FOO_HASH: … ;
    case STRING_BAR_HASH: … ;
    }

    Now you walk the string exactly once at runtime, to compute the hash, instead of doing logN comparisons in a bsearch.

    A disadvantage is that your syntax of case SOMETHING(string_i_want_to_match) is more obvious, but it would remove the limitation of word-characters (which is a reason i can’t apply your idea to my own problem ;-).

    For that matter, you might just have a nice, intrinsically extensible hash table of string->handler function, but that’s nowhere near as syntactically friendly as a switch statement…

Bad Behavior has blocked 406 access attempts in the last 7 days.