Python Unicode Weirdness

  • Post author:
  • Post category:Uncategorized

While discussing unicode on IRC with owen, we ran into a peculiarity in Python’s unicode handling. It can be tested with the following code:

>>> s = u'\U00010001\U00010002'
>>> len(s)
>>> s[0]

Python can be compiled to use either 16-bit or 32-bit widths for characters in its unicode strings (16-bit being the default). When compiled in 32-bit mode, the results of the last two statements are 2 and u'\U00010001' respectively. When compiled in 16-bit mode, the results are 4 and u'\ud800'.

So rather than just being an implementation detail, the unicode string width chosen at compile time can alter the result of Python programs that manipulate characters outside of the basic multilingual plane. It would be nice if Python programs didn’t have to care about this sort of detail …

Oxford

I’ve been in Oxford for the past week at the Canonical conference. There are lots of great people here, working on a lot of cool projects. Jordi’s blog has a lot more info about it.

7 August 2004

Bushisms

I found this one quite good.

Compulsory Voting

tberman: I agree that voters should have the right not to vote for anyone, but don’t feel that simply not turning up to vote is a good way to do so. In a non-compulsory election, the non-voter count is going to be comprised of those who are abstaining from voting, and those who are simply two lazy to turn up. With compulsory voting, those who don’t wish to vote for any particular candidate can simply leave their ballot blank, which is known as an informal vote.

Given the difference in turnout between U.S. and Australian elections, I’d guess that a fair number of the people who don’t vote in the U.S. would vote if they had to turn up to a polling place on the day.

I also think it is important for as many people as possible to vote. The people who get elected are supposed to represent the electorate. When there is a clear majority it doesn’t matter much, but in a marginal seat, those missing votes could easily swing the result. In this case, people can claim that the winner does not have the support of the majority of the electorate.

As far as the U.S. gravitating towards a two party system, I’d suggest that this is caused more by the vote counting procedure than the culture. When you have a system where where voting for someone who won’t get a high first preference count is equivalent to not voting, people are going to gravitate towards the parties where their votes actually make a difference (or not vote at all).

With a preferential system, you can vote for a minor party as your first preference, then number off the major parties with your other preferences. This also fixes the problem where two similar candidates might split the vote causing both to lose — one will get knocked out, and their votes will be transfered to the next preference (which would likely be the other similar candidate). This generally leads to the least unpopular candidate winning, rather than the most popular one.

3 August 2004

Fahrenheit 9/11

Went to see Fahrenheit 9/11 on Monday night. It was an interesting movie, but it was clearly aimed at a US audience. It did have a fair bit of information I hadn’t heard before, but in some areas he was obviously choosing which bits of information to include to increase the effect (eg. when listing the countries in the “coallition of the willing” he didn’t list Britain). Other bits seemed particularly relevant like the bit about the Bush administration playing with the terror alert apparently for political reasons, given what has happened so far this week.

Overall, I thought it was a good movie.

Firefox

Firefox is quite a nice browser, but the toolbars seem to have too much padding round the buttons in the toolbar. It looks like this is due to the double padding round the back and forward buttons.

It looks a bit better after creating a chrome/userChrome.css file in the profile directory containing the following:

@namespace url("http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul");
.toolbarbutton-1, .toolbarbutton-menubutton-button {
padding: 5px !important;
}
.toolbarbutton-menubutton-button {
margin: -1px 0px -1px -1px !important;
}
.toolbarbutton-1[type="menu-button"] {
padding: 0px !important;
}

You might need to adjust the negative margins to match the xthickness/ythickness of your GTK theme in order to make it look okay.

The other cool thing is that some people are working on adding GTK stock icon support to the Mozilla code base. While the initial focus of this is to add stock icons to buttons in the dialogs, it sounds like it could be extended to toolbar buttons and other places in the future. This would make it fit in on the Gnome desktop a lot better.

Subversion

Have been looking at the Subversion 1.1 release candidate, and it looks pretty good. This could be the point where more people start to seriously look at using Subversion as a CVS replacement.

This would be largely due to the new fsfs repository backend. This new backend doesn’t use berkeley db, and shouldn’t ever wedge like the BDB backend does occasionally. Furthermore, you don’t need write access to the repository to perform read only operations. This should make it a lot easier to set up systems where you have multiple ways of accessing the repository (eg. svnserve/ssh for write access, DAV and viewcvs for read access).

The fsfs backend stores each revision of the repository as two files in the repository (one for changes to the files/properties, and one to store revision properties), and doesn’t modify the files associated with previous revisions when performing a commit. This means that the the existing backup and mirror infrastructure that projects have set up for CVS repositories should work equally well for Subversion.

The “new file for each revision” policy also has some nice features. In the case of svn+ssh access where each committer can directly access the repository files, it means that the existing revisions in the repository can be made readonly without preventing people from committing new revisions (something that can’t really be done with CVS).

These administrative improvements should make it a lot easier to deploy Subversion, which in turn let more developers take advantage of its features.