13.07.2007 Switch On Strings In C And C++

See: http://blogs.testbit.eu/timj/2007/07/14/13072007-switch-on-strings-in-c-and-c/ (page moved)

For Rapicorn i need a simple and nicely readable way to special case code on various strings, which isn’t a problem in most scripting languages but is not provided by C/C++. Switching on strings turns out to be a widely investigated topic: SOS (agay), TheScript, JavaBug (CNFW), CodeGuru, CLC-FAQ and google has lots more.

Now here goes my approach to switching on strings for C/C++, caveats upfront:
– The strings may only consist of identifier chars: [A-Za-z_0-9];
– Convenient application is based on a GNU Preprocessor feature: -imacros;
– Macro implementations use a GCC-ism: Statement Expressions;
– The convenient command line variant uses a bash-ism: Process Substitution;
– One compile-time script is needed, currently implemented in python (though it could be written in any language).

With this out of the way, let’s dive into the code. This is the syntax:

	switch (SOSID_LOOKUP (sample_string))
          case SOSID (hello): printf ("Hello ");   break;
          case SOSID (world): printf ("World! ");  break;
          case 0: default:    printf ("unknown "); break;


This is how the code needs to be compiled: gcc -Wall -O2 -imacros <(sosid.py <input.c) input.c

The actual implementation is fairly simple, sosid.py extracts all identifiers from SOSID() statements and generates a handful of macro definitions: SOSID(), SOSID_LOOKUP(), SOSID_LOOKUP_ICASE(), ... Plus input specific macros: SOSID__hello, SOSID__world, ... GCC will pick up those definitions via the -imacros option. That is enough to implement SOSID_LOOKUP() so that it can yield the integer IDs that the SOSID(*)/SOSID__* statements expand to for any string that corresponds to one of the SOSID(identifiers) occurring in the source file.

call_switch ("hello"); call_switch ("foo0815"); call_switch ("world");
Hello unkown World!

The lookups are implemented via a binary search and the strings are sorted in a way so that binary searches for case sensitive and case insensitive lookups are both possible. That means for large string lists, the lookups which are of complexity O(ld N) actually have a performance advantage over lengthy if..else if..else constructs with O(N) complexity.
The code is available as a single script at the moment: sosid.py.

Note that the bash-ism, imacros-use and GCC-isms are not mandatory, ordinary temporary files can be generated by the script and be included instead and it could be adapted to generate portable inline functions. The identifier-chars restriction is a hard one though. It comes from the fact that SOSID(ident) must yield valid C/C++ integers, and that can only be accomplished by token pasting. Depending on GCC is good enough for Rapicorn, so people will have to express active interest in this, in order for me to make the script more portable.