<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: urlparse considered harmful</title>
	<link>http://blogs.gnome.org/jamesh/2007/12/10/urlparse-considered-harmful/</link>
	<description>Random stuff</description>
	<pubDate>Sat, 05 Jul 2008 11:57:36 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.3</generator>
		<item>
		<title>By: James Henstridge</title>
		<link>http://blogs.gnome.org/jamesh/2007/12/10/urlparse-considered-harmful/#comment-578</link>
		<dc:creator>James Henstridge</dc:creator>
		<pubDate>Mon, 17 Dec 2007 06:38:21 +0000</pubDate>
		<guid>http://blogs.gnome.org/jamesh/2007/12/10/urlparse-considered-harmful/#comment-578</guid>
		<description>Gavin: yeah, I saw that.  I was going to post a followup on that bug report but could not recover the password on my account :(  An account was created for me as part of the migration from SourceForge, but it seems that the password reset emails are getting eaten somewhere.

As for the reasons for the cache, perhaps it'd be worth checking the history of the module to see when it was implemented.</description>
		<content:encoded><![CDATA[<p>Gavin: yeah, I saw that.  I was going to post a followup on that bug report but could not recover the password on my account <img src='http://blogs.gnome.org/jamesh/wp-content/mu-plugins/tango-smilies/face-sad.png' alt=':(' class='wp-smiley' width='16' height='16' />  An account was created for me as part of the migration from SourceForge, but it seems that the password reset emails are getting eaten somewhere.</p>
<p>As for the reasons for the cache, perhaps it&#8217;d be worth checking the history of the module to see when it was implemented.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gavin Panella</title>
		<link>http://blogs.gnome.org/jamesh/2007/12/10/urlparse-considered-harmful/#comment-576</link>
		<dc:creator>Gavin Panella</dc:creator>
		<pubDate>Sat, 15 Dec 2007 09:43:36 +0000</pubDate>
		<guid>http://blogs.gnome.org/jamesh/2007/12/10/urlparse-considered-harmful/#comment-576</guid>
		<description>James, do you know of an instance where the cache is actually worth it? Timings for me were ~3.5usec to urlparse the url for this page, with cache, and 9usec with the caching code removed. It's still quick. I can't think of any situation that would need those extra few usecs, but people who do need to shave them can do caching themselves (and thus be more efficient with a cache that's closer to their code, and needs).

Side-effects like this can be bad news, especially when they're undocumented. I would argue that this cache probably shouldn't be in the standard library.

Someone else (in a thread far far away that you've probably already read) said that this is fixed in Python now: http://bugs.python.org/issue1313119. The caching code is still there, but now it's keyed with a 5-tuple, having added the types of url and scheme to the 3-tuple from before.</description>
		<content:encoded><![CDATA[<p>James, do you know of an instance where the cache is actually worth it? Timings for me were ~3.5usec to urlparse the url for this page, with cache, and 9usec with the caching code removed. It&#8217;s still quick. I can&#8217;t think of any situation that would need those extra few usecs, but people who do need to shave them can do caching themselves (and thus be more efficient with a cache that&#8217;s closer to their code, and needs).</p>
<p>Side-effects like this can be bad news, especially when they&#8217;re undocumented. I would argue that this cache probably shouldn&#8217;t be in the standard library.</p>
<p>Someone else (in a thread far far away that you&#8217;ve probably already read) said that this is fixed in Python now: <a href="http://bugs.python.org/issue1313119." rel="nofollow">http://bugs.python.org/issue1313119.</a> The caching code is still there, but now it&#8217;s keyed with a 5-tuple, having added the types of url and scheme to the 3-tuple from before.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stoffe</title>
		<link>http://blogs.gnome.org/jamesh/2007/12/10/urlparse-considered-harmful/#comment-572</link>
		<dc:creator>Stoffe</dc:creator>
		<pubDate>Tue, 11 Dec 2007 13:43:24 +0000</pubDate>
		<guid>http://blogs.gnome.org/jamesh/2007/12/10/urlparse-considered-harmful/#comment-572</guid>
		<description>No Armin, that's looking at it from the wrong way (or making an unnecessary apology for the library). It shouldn't be a problem to use unicode to just parse an url and get it back in unicode strings, as that may be what you want in HTML/XML and well just about any normal text. Encoding changes with purpose and URLs are of course always encoded in the same encoding as the surrounding document... 

The bug is that there is a difference and the library does not take that in account, returning the wrong cache result. Think hash collision.

If it was a PEBKAC error (it's not) it would still be a pretty poor library that accepted faulty input and silently produced unexpected, incorrect results. Now it is not a poor library, just one with an unfortunate bug that should be fixed.</description>
		<content:encoded><![CDATA[<p>No Armin, that&#8217;s looking at it from the wrong way (or making an unnecessary apology for the library). It shouldn&#8217;t be a problem to use unicode to just parse an url and get it back in unicode strings, as that may be what you want in HTML/XML and well just about any normal text. Encoding changes with purpose and URLs are of course always encoded in the same encoding as the surrounding document&#8230; </p>
<p>The bug is that there is a difference and the library does not take that in account, returning the wrong cache result. Think hash collision.</p>
<p>If it was a PEBKAC error (it&#8217;s not) it would still be a pretty poor library that accepted faulty input and silently produced unexpected, incorrect results. Now it is not a poor library, just one with an unfortunate bug that should be fixed.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Armin Ronacher</title>
		<link>http://blogs.gnome.org/jamesh/2007/12/10/urlparse-considered-harmful/#comment-565</link>
		<dc:creator>Armin Ronacher</dc:creator>
		<pubDate>Mon, 10 Dec 2007 14:13:04 +0000</pubDate>
		<guid>http://blogs.gnome.org/jamesh/2007/12/10/urlparse-considered-harmful/#comment-565</guid>
		<description>Hoi.  As URLs are encodingless you cannot use unicode objects on those.  So *always* encode into a charset before using it.  If you are passing unicode objects to it, you're doing something wrong.

(Except of unicode strings just containing ASCII data which are coerced to bytestrings automatically)

The solution is called IRI btw and not that supported so far.

Regards,
Armin</description>
		<content:encoded><![CDATA[<p>Hoi.  As URLs are encodingless you cannot use unicode objects on those.  So *always* encode into a charset before using it.  If you are passing unicode objects to it, you&#8217;re doing something wrong.</p>
<p>(Except of unicode strings just containing ASCII data which are coerced to bytestrings automatically)</p>
<p>The solution is called IRI btw and not that supported so far.</p>
<p>Regards,<br />
Armin</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: James Henstridge</title>
		<link>http://blogs.gnome.org/jamesh/2007/12/10/urlparse-considered-harmful/#comment-564</link>
		<dc:creator>James Henstridge</dc:creator>
		<pubDate>Mon, 10 Dec 2007 11:48:00 +0000</pubDate>
		<guid>http://blogs.gnome.org/jamesh/2007/12/10/urlparse-considered-harmful/#comment-564</guid>
		<description>ignacio: Not everything is defined in terms of unicode.  The example I gave above is the HTTP protocol, which is defined in terms of octets (bytes) and uses URLs.

If I am processing a redirect in an HTTP client library, I probably want to stay at the bytes level when constructing the new destination URL.

When processing the data from the HTTP response, I probably will convert it to unicode while parsing HTML or XML.  I'll probably need to do some URL processing there as well.

So I have URL processing at two levels of code.  If they happen to process the same URL, I may end up with my HTTP requests getting automatically promoted to unicode some of the time.</description>
		<content:encoded><![CDATA[<p>ignacio: Not everything is defined in terms of unicode.  The example I gave above is the HTTP protocol, which is defined in terms of octets (bytes) and uses URLs.</p>
<p>If I am processing a redirect in an HTTP client library, I probably want to stay at the bytes level when constructing the new destination URL.</p>
<p>When processing the data from the HTTP response, I probably will convert it to unicode while parsing HTML or XML.  I&#8217;ll probably need to do some URL processing there as well.</p>
<p>So I have URL processing at two levels of code.  If they happen to process the same URL, I may end up with my HTTP requests getting automatically promoted to unicode some of the time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ignacio</title>
		<link>http://blogs.gnome.org/jamesh/2007/12/10/urlparse-considered-harmful/#comment-563</link>
		<dc:creator>ignacio</dc:creator>
		<pubDate>Mon, 10 Dec 2007 10:50:25 +0000</pubDate>
		<guid>http://blogs.gnome.org/jamesh/2007/12/10/urlparse-considered-harmful/#comment-563</guid>
		<description>Or you could consistently use unicode internally.</description>
		<content:encoded><![CDATA[<p>Or you could consistently use unicode internally.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: fraggle</title>
		<link>http://blogs.gnome.org/jamesh/2007/12/10/urlparse-considered-harmful/#comment-562</link>
		<dc:creator>fraggle</dc:creator>
		<pubDate>Mon, 10 Dec 2007 09:29:11 +0000</pubDate>
		<guid>http://blogs.gnome.org/jamesh/2007/12/10/urlparse-considered-harmful/#comment-562</guid>
		<description>Looks like a classic case of "optimisation is the root of all evil".</description>
		<content:encoded><![CDATA[<p>Looks like a classic case of &#8220;optimisation is the root of all evil&#8221;.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
