12:51 am General

Sven this is because in some locales the alphabet is ordered “AaBbCc….Zz”, and in others it uses the ASCII ordering of “ABC…Z…abc…z”. This depends on the LC_COLLATE environment variable (usually specified as part of LANG). I remember having a discussion about this a while backon the ILUG mailing list, but I don’t remember when or with who.

For this reason, using A-Z for caps is usually ill-advised. You are better off using the character classes [:lower:] and [:upper:] if they are supported (they’re a posix thing and most old seds won’t have them). The alternative way is to use ‘[ABCDEFGHIJKLMNOPQRSTUVWXYZ]’ which isn’t that long for a regex…

Mayo got thrashed. And I didn’t even get to see the match because the only Irish pub in Lyon with an Irish satellite subscription (illegal) has been having problems and the guy on the other end of the mobile phone number they have for “service” didn’t fix it in time. Double bummer.

Update: Interesting experiment – I just set LANG to fr_FR and put a bunch of characters in latin-1 (like â, ä, Ã&plusmm;) into a file, and they are all matches by the [:lower:] character class (try it – one character per line, then run grep ‘[[:lower:]]’ testfile). With a similar file, try grepping for ‘[A-z]’, ‘[a-z]’, ‘[A-Z]’. It seems like sed here collates properly for a-z, but it doesn’t pick up anything for A-z. Funny, that.

Update 2: Not funny at all, actually – it seems that fr_FR collates lower-case letters before upper-case letters, with the result that ‘[a-Z]’ picks up all the letters.

Update 3 (the last one, I promise): I should have read PlanetGNOME more closely – looks like Mr. Love got there first.

Comments are closed.