January 2015 – muellis blog

A paper that I have authored has ~~recently~~ been published a while ago, but I’ve put this post off for a long time now. Before talking about the paper itself, I want to talk about Academia as I have the feeling that I need to defend myself for playing their game™. The following may sounds overly pessimistic and a while a few bright spots are going to be mentioned, many have been left out for ranting reasons. Keep that in mind when reading that somewhat unstructured rant…

Published papers are the currency in Academia. The more you have, the more respected you are. The quantity is the main metric. No wonder, given that quality control measures are not very well deployed. Pretty much the only mechanism to ensure quality is peer review. The holy grail.

Although the more papers at “better” conferences or journals you have, the better you are, the quality of the conference or journal and the quality of the paper are rarely questioned after the publication. Again, I don’t have proper proof for the statements I make as this is supposed to be a more general rant on current practises in Academia. I can only tell from experience. From me listening to people talking about fellow academics, from observing key metrics in various web portals, or seeing people applying for academic positions. Those people usually have an enumeration of their publications. Maybe it’s a “selection”. But I’ve never seen that people put a “ranking” of the quality of the publisher nor the publication itself. And it wouldn’t make sense, because we don’t have metrics for that, anyway. Sure, there are some people or companies trying to come up with something meaningful. But metrics such as “rejection rate”, “number of citations”, or “h-index” are inherently flawed. For many reasons. Mainly because the data is proprietary. You have to rely on the conference or the journal providing you with correct data. You cannot know whether it is correct as there is no right for you to know. Secondarily, the metric might suffer from chilling effects, such that people think the quality of their publication in spe is too weak to be able to be published on a “high ranked” conference. So they don’t even bother to submit. Other metrics like the average citation count after five years resembles much more a stochastic experiment rather than reflecting the quality of the publications (Ike Antkare anyone?). Again, you have the effect of people wanting to cite some paper of a “high ranked” conference, because that is what people will cite in the future. And in order to be found more easily in the future via backwards citation searches, you’d rather cite publications you think will be cited more often in the future (cf.).

Talking about quality…

You have to trust the peer review of the conference or journal but you actually cannot because you don’t even know who the peers were. It’s good to have an informed opinion and it’s a good thing to be able to rely on an informed judgement. But it’s not good having to rely on that. If, for whatever reason, a peer fails to provide appropriate reviews, one should be able to make a decision oneself. Some studies have indeed shown that the peer review process is no better than flipping a coin. So there seems to be some need to review the peer review.

Once again to be clear: I don’t mind peer review. I think it’s good. Blindly publishing without ensuring that there is indeed an advancement of world’s knowledge wouldn’t be good. And peer review could be a tool to control that. But it doesn’t do it right now. I don’t have any concrete proposal. But I think if the reviews themselves and the reviewers were known, then we could make better decisions as to whether to “trust” a publication or not.

Another proposal is to not have “journals” as physical hard copies anymore. It is ~~2014~~2015, we have the Web, we have some cool technologies. But we don’t make use of any of that. Instead, we are maintaining the status from 20, or rather 200, years ago. We still subscribe to one-off bundles of printed and stapled paper. And we pay loads for that. And not only do we pay loads for receiving that, if you wanted to publish in one of those journals (or conferences), you have to pay, too. In fairness, it’s not only the printing and stapling that costs money, but the services around that. Things like proof reading (has anyone ever gotten a lectorate?), the peer review (has any peer ever gotten any reimbursement?), or the maintenance of an online database (why is it so damn hard to use any of these web databases?) are things we pay money for. I doubt that we need Journals in their current form. We probably do need entities (call them “publishers”), who in turn will need to earn some money, to make sure everything is going smoothly. But we don’t need print-and-forget style publishing. If we could add things like comments, annotations, links, reviews, supplementary material, a varying level of detail, to a paper, even after a few years or even decades, we could move to a “permanently peer reviewed” model. A publication is being reviewed all the time. Ideally by the general public. We could model our current workflow by delegating some form of trust to a group of people, say “reviewers of Journal X”, and only see what these people have vouched for. We could then selectively exclude people from that group of trustees, much like the web of trust. We could, if a paper makes an assumption which is falsified in the future, render some warning when opening the publication. We could decentralise the data such that everyone could build their own index, search mechanism, or interface.

On interfaces

Right now, if you wanted to, say, re-conduct the experiments done in published papers and share your results, you would have to create a publication (which is expected, but right now you would likely have to pay for that) and cite the papers whose results you are trying to reproduce. That’s okay. But if I then wanted to see when and how successful people tried to redo the experiments, I’d have to rely on the database I’m using to provide a reverse citation search and have the correct data (which, for some databases, seems to be the ability to do OCR on the PDF…). That’s not how things should work nowdays, right? We’d expect something more interactive, with tags, open data, something wikiesque. While the ability to reverse-search citations, to highlight some key references, or to link to a key contribution that followed a paper at hand would be nice indeed, we probably have to step back and make existing functionality somewhat usable. I’m not talking about advanced stuff like exporting search results in a standardised format or about deep linking to a result set from a query. That would need treatment after we’ve solved actually searching for multiple keywords, excluding some conferences or journals, or joining or intersecting queries. All that only works to some extent and it’s depressing that we cannot do anything about it, because we don’t have the relevant access or data. Don’t believe me? Well, you shouldn’t. But I’ll provide a table, probably in another post, showing what works with which database and what does not.

On experiments

As I was referring to reproducing results: It is pretty much impossible to reproduce any result, at least in my field, computer science. You don’t get the raw data, let alone the programs to run to get the results. You could argue that it is too complicated to maintain a program that can be run on any platform. Fair enough. I don’t have a solution. But the situation right now is not a good status quo. Right now you don’t get anything. So even if you had the very same setup as the authors of some publication, you would not be able to redo the experiments. It’s likely to be similar in other disciplines. I imagine that rocket scientists do experiments with self made devices or with some utterly expensive appliance (think LHC). Nobody will be able to reproduce the results, simply because there is just that one LHC out there… But… fortunately we have many digital things which are easy to archive and distribute. We, computer scientists, should make use of that. Why not require authors to submit a virtual appliance in some openly specified format? Obviously, source code would be nice, but even in academia there doesn’t seem to be a culture of sharing code freely, so I’m not even suggesting that.

Phew. After having criticised Academia and having made some half baked proposals I forgot what I actually wanted to do: Being a good academic (not caring about the public perception of “good” in terms of quantity of publications), and discuss a few things around the paper that we paid a couple of hundred dollars for to get published. But I leave that for another ~~rant~~ post.

In what ways do you think is Academia broken?

Month: January 2015

On Academia…

Talking about quality…

On interfaces

On experiments