gaming the system

I’m sure it isn’t just me that has noticed Google isn’t really as useful as it used to be any more. First there were the empty ‘wrapper’ sites that got onto the adwords box – you know, the ones that seemed to have ‘all about foo’ for every ‘foo’ search, but when you clicked on them just had the output of a search engine in them. Adwords are easy to ignore but sometimes you do actually want to find companies selling stuff. They were occasionally in the main result area too. They seem to come up a little less often now – or maybe i’m just searching for different things.

Then we have cloaking. e.g. the web site serves different content to a search engine than it does to users. So when you do a search you get a nice summary of what looks like what you want, but you click on it and all you get is a payment gateway. It is particularly prominent when looking for technical articles. See Summary of Academic Publishers Cloaking Discussion for some more information on this. It sucks big time.

Just as an example, lets try something simple, oh I dunno ‘efficient algorithm for sorting numbers external’ – a typical type of search for a software engineer.

8 links down we have (i’m not putting the link in html on purpose).

  A method for improving the efficiency of external sorting ..,.
    more efficient external sorting algorithms,based on a variety of
    distribution ... number of nodes), and an identical number of
    branches go from each node, ...
    www.springerlink.com/index/V3L0179J1801278L.pdf -

Ok, this isn’t really that useful looking, but this is just an example, and lets just take it as being what you’re after. A pdf and everything, lets go look … oh no, its just a payment gateway. $US32 for a paper … Hmm, that seems a little steep. Particularly if you look at the publishing date (go on, have a look, it might surprise you). I wonder how much of that the author gets, if he’s still alive.

Sometimes google scholar helps (but not in this particular case), given the title and author(s) you can often find free or draft versions of papers, but this is still a pain in the arse – why are these sites showing up at all in the main index when they are cloaking their information and intentionally gaming the system? I’m finding that searching for good quality coding and technical information is getting harder and harder, and google being complicit in this cloaking (see the linked article above, or search for ‘springerlink sucks’) just makes me angry at them (and frankly, who cares about the other search engines – they’re irrelevant).

And finally – take those away and searching for many types of information is just a lot harder than it used to be. I guess ‘the web’ has grown, and it’s mostly grown full of rubbish. I had yet another problem with Ubuntu yesterday – now I find 8.04 has major issues with USB mass storage devices on my laptop. Devices will drop out causing corruption, or refuse to work at all, both being totally unusable at best. It took a lot of searching for the right terms to uses to find something about the problem – and that was a lonely post on a forum. I guess we’re just unlucky with this together. Certain very popular terms like ubuntu, debian, fedora, linux are now so common it’s raising the signal to noise ratio significantly for any searches containing those terms. And so many sites cross-link with others too much that using linkage to weight results is becoming less useful (not that it was always super-great – I remember how advogato used to figure on the front page of just about any search for people who had an account on it).

I’m not sure about google news either. Today there were at least 4 stories on the iphone on the Australian front page – 3 in tech (i.e. all of them) and 1 in business. In the tech section by itself – the top 4 stories, with roadrunner (the fastest supercomputer in the world) pushed down to 5 or 6 (personally I think that is more tech-worthy, iphone belongs on the fashion or business pages if you ask me). Ok, the iphone is full of buzz, but one grouped story should surely suffice (google’s news selection is a bit strange sometimes, but normally it is at least a little better at grouping the same press release).

4 thoughts on “gaming the system”

  1. “I wonder how much of that the author gets, if he’s still alive.”

    None. The authors of papers in academic journals never receive any money; frequently the author (or their lab) is paying a per-page charge for the privilege of publishing in the journal. The reviewers likewise work for free, receiving nothing for their work. Journals also frequently use “guest editors” for a themed issue; that’s again a researcher, doing it for free (and for a bullet point in their CV). Academic journals are making money hand over fist.

    Fortunately, there’s ways around it of you’re looking for a paper. Search for keywords in the title and the first author’s last name, plus “pdf”, and chances are you’ll find a publicly accessible version. Or email one of the authors asking nicely for a copy; they’re most likely happy to oblige (we’re usually just happy anybody at all is interested in our work). Or ask a friend or acquaintance with academic online library access to download the paper if it’s important enough.

  2. I totally agree with you about Google losing some of it’s fu and especially about the situation with academic journals – I ran into this problem last night. I don’t think the two are related, however.

    It seems like it’s impossible to get unpaid access unless you join the right societies (ACM, IEEE, etc) or are luckily enough to be a student who can access them for free via various library portals.

    Happily, I am both a student and in the IEEE so I managed to get copies of the papers, but for everyone else it sucks. I don’t see what Google can do about that, however.

    /Mike

  3. Another thing that bugs me with google these days is sites that index usenet and mailinglists. Not that those sites are bad by themselfes, it’s that so many are doing it… sometimes I’ll google for something and find that 80% of hits are just the same usenet or mailinglist thread on dozens of different indexing sites.

  4. Hi. Hope you’re doing well. Found your monkey a while back, thought I’d Google you and say hello, and what do you know… it’s at least still that useful. Take care,

    Amanda

Leave a Reply

Your email address will not be published. Required fields are marked *