<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule"	>
<channel>
	<title>Comments on: Should UI strings in source code have non-ASCII characters?</title>
	<atom:link href="http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/</link>
	<description>Yet another GNOME Blogs weblog</description>
	<lastBuildDate>Mon, 18 May 2009 19:00:47 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Alexander Jones</title>
		<link>http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/comment-page-1/#comment-82</link>
		<dc:creator>Alexander Jones</dc:creator>
		<pubDate>Mon, 19 May 2008 23:43:09 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/#comment-82</guid>
		<description>@simos:

UTF-8 is designed so that subsequences are unambiguous. You won&#039;t get a byte less than 0x80 in any part of a multi-byte sequence. bytes 0x00-0x7F map directly to 7-bit ASCII.

Some people are worried about string functions breaking. I really don&#039;t see how this is the case, seeing as we&#039;re doing g_some_function (_(&quot;Some ASCII string&quot;)) which is replaced with a UTF-8 string at runtime anyway.

Does anyone have any actual proof of UTF-8 in our translatable strings breaking C?</description>
		<content:encoded><![CDATA[<p>@simos:</p>
<p>UTF-8 is designed so that subsequences are unambiguous. You won&#8217;t get a byte less than 0&#215;80 in any part of a multi-byte sequence. bytes 0&#215;00-0&#215;7F map directly to 7-bit ASCII.</p>
<p>Some people are worried about string functions breaking. I really don&#8217;t see how this is the case, seeing as we&#8217;re doing g_some_function (_(&#8221;Some ASCII string&#8221;)) which is replaced with a UTF-8 string at runtime anyway.</p>
<p>Does anyone have any actual proof of UTF-8 in our translatable strings breaking C?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Yevgen Muntyan</title>
		<link>http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/comment-page-1/#comment-81</link>
		<dc:creator>Yevgen Muntyan</dc:creator>
		<pubDate>Sat, 17 May 2008 00:20:38 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/#comment-81</guid>
		<description>UTF-8 strings like &quot;\xCE\x80&quot; *are* portable. They are not &quot;portable to different encodings&quot;, but nobody needs that (whatever that means). We need UTF-8 in C strings, and that&#039;s the way to have them. If you want nice in po files, make xgettext convert C escape sequences to nice UTF-8 symbols.

As to universal character names, it is implementation-defined what actually will be contained in the character array. I.e. if you have char *s = &quot;\u...&quot; then you have no idea how to display text pointed to by that variable in a gtk label. Also, once you have that line of code, it won&#039;t by magic change &quot;to different encodings&quot;, it will be whatever byte sequences the compiler will put in there and that&#039;s it. It&#039;s pretty much the same as &quot;abc&quot; - it won&#039;t by magic be valid UTF-16, no matter how you compile the file.

And by the way, MS does not implement C99 (surprised?).</description>
		<content:encoded><![CDATA[<p>UTF-8 strings like &#8220;\xCE\x80&#8243; *are* portable. They are not &#8220;portable to different encodings&#8221;, but nobody needs that (whatever that means). We need UTF-8 in C strings, and that&#8217;s the way to have them. If you want nice in po files, make xgettext convert C escape sequences to nice UTF-8 symbols.</p>
<p>As to universal character names, it is implementation-defined what actually will be contained in the character array. I.e. if you have char *s = &#8220;\u&#8230;&#8221; then you have no idea how to display text pointed to by that variable in a gtk label. Also, once you have that line of code, it won&#8217;t by magic change &#8220;to different encodings&#8221;, it will be whatever byte sequences the compiler will put in there and that&#8217;s it. It&#8217;s pretty much the same as &#8220;abc&#8221; &#8211; it won&#8217;t by magic be valid UTF-16, no matter how you compile the file.</p>
<p>And by the way, MS does not implement C99 (surprised?).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: simos</title>
		<link>http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/comment-page-1/#comment-80</link>
		<dc:creator>simos</dc:creator>
		<pubDate>Sat, 17 May 2008 00:02:33 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/#comment-80</guid>
		<description>@nona: For the narrow scope of GNOME, it appears that all POT/PO files follow the UTF-8 encoding. Indeed, if some translation teams were to use another encoding such as SHIFT-JIS, it would make a bit of a mess.

SHIFT-JIS is almost backward-compatible with ASCII (two characters differ).

@behdad: C99 defines a super-portable way to encode non-ASCII strings (using UCNs, as described in the added section in the post above). This is what gcc says about UCNs:

$ cat t.c
int main(void)
{
	char* str = &quot;\u0399&quot;;

	return 0;
}
$ gcc t.c -o t
t.c:3:14: warning: universal character names are only valid in C++ and C99
$ _

This means that UCNs work in gcc, but they produce a warning by default.

Using hand-encoded UTF-8 strings (such as &quot;\xCE\x80&quot;) makes the code less portable to different encodinds.</description>
		<content:encoded><![CDATA[<p>@<a href="http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/#comment-79">nona</a>: For the narrow scope of GNOME, it appears that all POT/PO files follow the UTF-8 encoding. Indeed, if some translation teams were to use another encoding such as SHIFT-JIS, it would make a bit of a mess.</p>
<p>SHIFT-JIS is almost backward-compatible with ASCII (two characters differ).</p>
<p>@behdad: C99 defines a super-portable way to encode non-ASCII strings (using UCNs, as described in the added section in the post above). This is what gcc says about UCNs:</p>
<p>$ cat t.c<br />
int main(void)<br />
{<br />
	char* str = &#8220;\u0399&#8243;;</p>
<p>	return 0;<br />
}<br />
$ gcc t.c -o t<br />
t.c:3:14: warning: universal character names are only valid in C++ and C99<br />
$ _</p>
<p>This means that UCNs work in gcc, but they produce a warning by default.</p>
<p>Using hand-encoded UTF-8 strings (such as &#8220;\xCE\x80&#8243;) makes the code less portable to different encodinds.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: nona</title>
		<link>http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/comment-page-1/#comment-79</link>
		<dc:creator>nona</dc:creator>
		<pubDate>Fri, 16 May 2008 22:10:04 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/#comment-79</guid>
		<description>What about non-unicode multibyte character sets that might still be popular in some countries? What happens when there&#039;s UTF-8 and, let&#039;s say, SHIFT-JIS in the same PO file?</description>
		<content:encoded><![CDATA[<p>What about non-unicode multibyte character sets that might still be popular in some countries? What happens when there&#8217;s UTF-8 and, let&#8217;s say, SHIFT-JIS in the same PO file?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: behdad</title>
		<link>http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/comment-page-1/#comment-78</link>
		<dc:creator>behdad</dc:creator>
		<pubDate>Fri, 16 May 2008 03:07:47 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/#comment-78</guid>
		<description>I think the reason some think C source code should be 7-bit is that your *compiler* can screw up if run under a non-UTF-8 locale.  And that may actually be required by the C standard.  Not motivated enough to test it.</description>
		<content:encoded><![CDATA[<p>I think the reason some think C source code should be 7-bit is that your *compiler* can screw up if run under a non-UTF-8 locale.  And that may actually be required by the C standard.  Not motivated enough to test it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: simos</title>
		<link>http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/comment-page-1/#comment-77</link>
		<dc:creator>simos</dc:creator>
		<pubDate>Thu, 15 May 2008 11:16:54 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/#comment-77</guid>
		<description>@Alexander: Some compilers complain when the source code has non-ASCII characters. 

That is, bytes with the 8th bit set. Both iso-8859-x and utf-8 can have bytes that the value is &gt;127.

Or, bytes with value &lt;32 (control characters). That could be the case with UTF-8 when a character has codepoint value &gt;127.</description>
		<content:encoded><![CDATA[<p>@Alexander: Some compilers complain when the source code has non-ASCII characters. </p>
<p>That is, bytes with the 8th bit set. Both iso-8859-x and utf-8 can have bytes that the value is >127.</p>
<p>Or, bytes with value &lt;32 (control characters). That could be the case with UTF-8 when a character has codepoint value >127.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alexander Jones</title>
		<link>http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/comment-page-1/#comment-76</link>
		<dc:creator>Alexander Jones</dc:creator>
		<pubDate>Thu, 15 May 2008 10:42:14 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/#comment-76</guid>
		<description>UTF-8 translates to ISO-8859 fine, insofar that it remains valid, even if it is garbage. I don&#039;t see why a compiler would screw up on parsing UTF-8 characters, as they just appear like a series of ISO-8859-x characters.

Maybe I&#039;m missing something?</description>
		<content:encoded><![CDATA[<p>UTF-8 translates to ISO-8859 fine, insofar that it remains valid, even if it is garbage. I don&#8217;t see why a compiler would screw up on parsing UTF-8 characters, as they just appear like a series of ISO-8859-x characters.</p>
<p>Maybe I&#8217;m missing something?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Yevgen Muntyan</title>
		<link>http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/comment-page-1/#comment-75</link>
		<dc:creator>Yevgen Muntyan</dc:creator>
		<pubDate>Wed, 14 May 2008 19:58:54 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/#comment-75</guid>
		<description>First fix all text editors, so they don&#039;t screw up your unicode (on the way to there remove iso8559-15 markers from all source files). Next, fix the C and C++ standards so all compilers understand UTF-8 source by default. Then use UTF-8 in C code ;)
Note that gcc is not the only C compiler for desktops, MS makes some too. UTF-8 in source code is GNU-C-ism which makes code less portable.</description>
		<content:encoded><![CDATA[<p>First fix all text editors, so they don&#8217;t screw up your unicode (on the way to there remove iso8559-15 markers from all source files). Next, fix the C and C++ standards so all compilers understand UTF-8 source by default. Then use UTF-8 in C code <img src='http://blogs.gnome.org/simos/wp-content/mu-plugins/tango-smilies/tango/face-wink.png' alt=';)' class='wp-smiley' /><br />
Note that gcc is not the only C compiler for desktops, MS makes some too. UTF-8 in source code is GNU-C-ism which makes code less portable.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sebastian Benitez</title>
		<link>http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/comment-page-1/#comment-74</link>
		<dc:creator>Sebastian Benitez</dc:creator>
		<pubDate>Wed, 14 May 2008 15:55:52 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/#comment-74</guid>
		<description>Like Phil says, languages like spanish, german and french use different quotes than english. For spanish it would be «these quotes».</description>
		<content:encoded><![CDATA[<p>Like Phil says, languages like spanish, german and french use different quotes than english. For spanish it would be «these quotes».</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Phil</title>
		<link>http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/comment-page-1/#comment-73</link>
		<dc:creator>Phil</dc:creator>
		<pubDate>Wed, 14 May 2008 12:05:02 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.gnome.org/simos/2008/05/14/should-ui-strings-in-source-code-have-non-ascii-characters/#comment-73</guid>
		<description>It isn&#039;t always a valid assumption that apps are written with american english strings, so why not just go the whole hog and use semi-symbolic strings by default?

printf(_(&quot;file not found: %s\n&quot;));

might not be very friendly, but it&#039;s direct and equally easy to translate, regardless of what quotes your region uses.  There exist english translations of a lot of english software already (en_UK etc) so adding en_US isn&#039;t creating a major new translation job.

Obviously I wouldn&#039;t bother changing old strings, as long as compilers aren&#039;t erroring anyway, but a gradual shift doesn&#039;t seem to be a lot of work.</description>
		<content:encoded><![CDATA[<p>It isn&#8217;t always a valid assumption that apps are written with american english strings, so why not just go the whole hog and use semi-symbolic strings by default?</p>
<p>printf(_(&#8221;file not found: %s\n&#8221;));</p>
<p>might not be very friendly, but it&#8217;s direct and equally easy to translate, regardless of what quotes your region uses.  There exist english translations of a lot of english software already (en_UK etc) so adding en_US isn&#8217;t creating a major new translation job.</p>
<p>Obviously I wouldn&#8217;t bother changing old strings, as long as compilers aren&#8217;t erroring anyway, but a gradual shift doesn&#8217;t seem to be a lot of work.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
