December 2009 – muellis blog

Why I cannot use turnitin.com

We are were supposed to use a proprietary webservice to hand in a paper:

You should also upload the essay to turnitin.com using the password key:

5vu0h5fw and id: 2998602

Late entries will suffer a penalty.

I cannot use this service. The simplest reason being that I cannot agree to their ToS.

Let me clarify just by picking some of their points off their ToS:

By clicking the “I agree — create profile” button below You: (1) represent that You have read and understand

As I am not a native speaker of neither English nor law-speak, I cannot agree that I fully understand those ToS.

With the exception of the limited license granted below, nothing contained herein shall be construed as granting You any right, […]

Whatever that means, it sounds scary to me.

You further represent that You are not barred from receiving the Services or using the Site under the laws of the United States or other applicable jurisdiction.

I am sorry but I do not know whether this holds for me.

You may not modify, copy, distribute, transmit, display, perform, reproduce, publish, license, create derivative works from, transfer, or sell any information, Licensed Programs or Services from the Site without the prior written consent of iParadigms,

Lucky me, that I did not agree to their ToS yet so that I can copy them and bring them up here…

You further agree not to cause or permit the disassembly, decompilation, recompilation, or reverse engineering of any Licensed Program or technology underlying the Site. In jurisdictions where a right to reverse engineer is provided by law unless information is available about products in order to achieve interoperability, functional compatibility, or similar objectives, You agree to submit a detailed written proposal to iParadigms concerning any information You need for such purposes before engaging in reverse engineering.

I seriously do not want to write a proposal to this company for every new website I will build just because they use a <form> or some AJAX.

You are entirely responsible for maintaining the confidentiality of Your password

I cannot do that because I do not even know how they store my password (we are talking about an ASP program after all…).

You agree to use reasonable efforts to retain the confidentiality of class identification numbers and passwords. In no circumstance shall You transmit or make Your password or class identification number or any other passwords for the Site or class identification numbers available in any public forum, including, but not limited to any web page, blog, advertisement or other posting on the Internet, any public bulletin board, and any file that is accessible in a peer-to-peer network.

Yeah, sure. Nobody will find it on the page itself anyway.

This User Agreement is governed by the laws of the State of California, U.S.A. You hereby consent to the exclusive jurisdiction and venue of state and federal courts in Alameda County, California, U.S.A., in all disputes arising out of or relating to the use of the Site or the Services.

Might sound weird, but I do not want to be arraigned in the USA.

You agree not to use the Site in any jurisdiction that does not give effect to all provisions of these terms and conditions, including without limitation this paragraph.

Of course, I do not know enough about this jurisdiction to agree to those ToS.

Needless to say, that I do not want my data to fall under the American 9-11 Patriot Act.

Besides the above mentioned legal issues, I also have ethical concerns to contribute to the profit of a dodgy company by providing them my written essay so that they can use that to check other works against mine. If I believed in copyright, I could probably claim infringement as well.

Other topics, such as the violation of the presumption of innocence, are covered by resources on the web. And there is plenty of it. The most interesting ones include this and this.

Admittedly, I could care not as much as I do, but being an academic also means to think critically.

I more or less sent this email to the lecturer and it turned out that it’s not compulsory to use this dodgy service! *yay*

The future, however, is not safe yet, so more action is needed…

Wikify Pages

In one of our modules, “System Software”, we were asked to write a bash script which wikifies a page. That means to identify all nouns and replace them with a link to the Wikipedia.

I managed to write that up in two hours or so and I think I have a not so ugly solution (*cough* it’s still bash… *cough*). It has (major) drawbacks though. Valid X(HT)ML, i.e.

<![CDATA[ <body>

before the actual body will be recognized as the beginning. But parsing XML with plain bash is not that easy.

Also, my script somehow does not parse the payload correctly, that is it tails all the way down to the end of the file instead of stopping when </body> is reached.

Anyway, here’s the script:

#!/bin/bash
### A script which tries to Wikipediafy a given HTML page
### That means that every proper noun is linked against the Wikipedia
### but only if it's not already linked against something.
### Assumptions are, that the HTML file has a "<body>" Tag on a seperate
### line and that "<a>" Tags don't span multiple lines.
### Also, Capitalised words in XML Tags are treated as payload words, just
### because parsing XML properly is a matter of 100k Shellscript (consider
### parsing <[DATA[!). Also, this assumption is not too far off, because
### Captd words in XML happen seldomly anyway.
### As this is written in Bash, it is horribly slow. You'd rather want to do
### this in a language that actually understand {X,HT}ML like Python
 
# You might want to change this
BASEURL="http://en.wikipedia.org/wiki/"
 
set -e
 
### Better not change anything below, it might kill kittens
# To break only on newlines, set the IFS to that
OLD_IFS=$IFS
IFS='
'
HTML=$(cat $1) # Read the file for performance reasons
# Find the beginning and end of Document and try to be permissive with HTML
# Errors by only stopping after hitting one <body> Tag
START_BODY=$(grep --max-count=1 -ni '<body>'<<<"$HTML" | cut -d: -f1)
END_BODY=$(grep --max-count=1 -ni '</body>'<<<"$HTML" | cut -d: -f1)
 
HEAD=$(head -n $START_BODY<<<"$HTML") # Extract the Head for later use
# $(()) is most probably a non-portable bashism, so one wants to get rid of that
RANGE_BODY=$(($END_BODY-$START_BODY))
 
# And the extract the body
PAYLOAD=$(tail -n +${START_BODY} <<<"$HTML" | tail -n ${RANGE_BODY})
 
### This is the main part
### Basically search for all words beginning with a capital letter
### and match that. We can use that later with \1.
 
### Try to find already linked words, replace them by their MD5 hash,
### Run generic Word finding mechanism and replace back later
 
# We simply assume that a link doesn't span multiple lines
LINKMATCHES=$(grep -i -E --only-matching '<a .*>.*</a>' $1 || true)
 
LINKMATCH_ARRAY=()
MD5_ARRAY=()
CLEANEDPAYLOAD=$PAYLOAD
if [[ -n $LINKMATCHES ]]; then
    # We have found already linked words, put them into an array
    LINKMATCH_ARRAY=( $LINKMATCHES )
    index=0 # iterate over array
    for MATCH in $LINKMATCHES; do
        # Uniquely hash the found link and replace it, saving it's origin
        MATCHMD5=$(md5sum <<<$MATCH | awk '{print $1}')
        MD5_ARRAY[$index]=$MATCHMD5
        # We simply assume that there's no "," in the match
        # Use Bash internals string replacement facilities
        CLEANEDPAYLOAD=${CLEANEDPAYLOAD//${MATCH}/${MATCHMD5}}
        let "index = $index + 1"
    done
fi
 
 
# Find the matches
WORDMATCHES=$(grep --only-matching '[A-Z][a-z][a-z]*'<<<$CLEANEDPAYLOAD | sort | uniq)
WORDMATCHES_ARRAY=( $WORDMATCHES )
index=0
WIKIFIED=$CLEANEDPAYLOAD
while [[ "$index" -lt ${#WORDMATCHES_ARRAY[@]} ]]; do
    # Yeah, iterating over an array with 300+ entries is fun *sigh*
    # You could now ask Wikipedia and only continue if the page exist
    # if wget -q "${BASEURL}${SEARCH}"; then ...; else ...; fi
    SEARCH="${WORDMATCHES_ARRAY[$index]}"
    REPLACE="<a href=\"${BASEURL}${SEARCH}\">\2</a>"
    # Note, that we replace the first occurence only
    #WIKIFIED=${WIKIFIED/$SEARCH/$REPLACE} ## That's horribly slow, so use sed
    # Watch out for a problem: "<p>King" shall match as well as "King</p>"
    # or "King." but not eBook.
    # We thus match the needle plus the previous/following char,
    # iff it's not [A-Za-z]
    WIKIFIED=$(sed -e "s,\([^A-Za-z]\)\($SEARCH\)\([^A-Za-z]\),\1$REPLACE\3,"<<<$WIKIFIED) # so use sed
    let "index += 1"
done
 
# Replace hashed links with their original, same as above, only reverse.
# One could apply this technique to other tags besides <a>, but you really
# want to write that in a proper language :P
index=0
NOLINKWIKIPEDIAFIED=$WIKIFIED
while [[ "$index" -lt ${#MD5_ARRAY[@]} ]]; do
    SEARCH=${MD5_ARRAY[$index]}
    REPLACE=${LINKMATCH_ARRAY[$index]}
    NOLINKWIKIPEDIAFIED=${NOLINKWIKIPEDIAFIED//$SEARCH/$REPLACE}
    let "index += 1"
done
 
### Since we have the head and the payload separate, echo both
echo $HEAD
echo $NOLINKWIKIPEDIAFIED
 
# Reset the IFS, e.g. for following scripts
IFS=$OLD_IFS

#!/bin/bash ### A script which tries to Wikipediafy a given HTML page ### That means that every proper noun is linked against the Wikipedia ### but only if it's not already linked against something. ### Assumptions are, that the HTML file has a "<body>" Tag on a seperate ### line and that "<a>" Tags don't span multiple lines. ### Also, Capitalised words in XML Tags are treated as payload words, just ### because parsing XML properly is a matter of 100k Shellscript (consider ### parsing <[DATA[!). Also, this assumption is not too far off, because ### Captd words in XML happen seldomly anyway. ### As this is written in Bash, it is horribly slow. You'd rather want to do ### this in a language that actually understand {X,HT}ML like Python # You might want to change this BASEURL="http://en.wikipedia.org/wiki/" set -e ### Better not change anything below, it might kill kittens # To break only on newlines, set the IFS to that OLD_IFS=$IFS IFS=' ' HTML=$(cat $1) # Read the file for performance reasons # Find the beginning and end of Document and try to be permissive with HTML # Errors by only stopping after hitting one <body> Tag START_BODY=$(grep --max-count=1 -ni '<body>'<<<"$HTML" | cut -d: -f1) END_BODY=$(grep --max-count=1 -ni '</body>'<<<"$HTML" | cut -d: -f1) HEAD=$(head -n $START_BODY<<<"$HTML") # Extract the Head for later use # $(()) is most probably a non-portable bashism, so one wants to get rid of that RANGE_BODY=$(($END_BODY-$START_BODY)) # And the extract the body PAYLOAD=$(tail -n +${START_BODY} <<<"$HTML" | tail -n ${RANGE_BODY}) ### This is the main part ### Basically search for all words beginning with a capital letter ### and match that. We can use that later with \1. ### Try to find already linked words, replace them by their MD5 hash, ### Run generic Word finding mechanism and replace back later # We simply assume that a link doesn't span multiple lines LINKMATCHES=$(grep -i -E --only-matching '<a .*>.*</a>' $1 || true) LINKMATCH_ARRAY=() MD5_ARRAY=() CLEANEDPAYLOAD=$PAYLOAD if [[ -n $LINKMATCHES ]]; then # We have found already linked words, put them into an array LINKMATCH_ARRAY=( $LINKMATCHES ) index=0 # iterate over array for MATCH in $LINKMATCHES; do # Uniquely hash the found link and replace it, saving it's origin MATCHMD5=$(md5sum <<<$MATCH | awk '{print $1}') MD5_ARRAY[$index]=$MATCHMD5 # We simply assume that there's no "," in the match # Use Bash internals string replacement facilities CLEANEDPAYLOAD=${CLEANEDPAYLOAD//${MATCH}/${MATCHMD5}} let "index = $index + 1" done fi # Find the matches WORDMATCHES=$(grep --only-matching '[A-Z][a-z][a-z]*'<<<$CLEANEDPAYLOAD | sort | uniq) WORDMATCHES_ARRAY=( $WORDMATCHES ) index=0 WIKIFIED=$CLEANEDPAYLOAD while [[ "$index" -lt ${#WORDMATCHES_ARRAY[@]} ]]; do # Yeah, iterating over an array with 300+ entries is fun *sigh* # You could now ask Wikipedia and only continue if the page exist # if wget -q "${BASEURL}${SEARCH}"; then ...; else ...; fi SEARCH="${WORDMATCHES_ARRAY[$index]}" REPLACE="<a href=\"${BASEURL}${SEARCH}\">\2</a>" # Note, that we replace the first occurence only #WIKIFIED=${WIKIFIED/$SEARCH/$REPLACE} ## That's horribly slow, so use sed # Watch out for a problem: "<p>King" shall match as well as "King</p>" # or "King." but not eBook. # We thus match the needle plus the previous/following char, # iff it's not [A-Za-z] WIKIFIED=$(sed -e "s,$[^A-Za-z]$$$SEARCH$$[^A-Za-z]$,\1$REPLACE\3,"<<<$WIKIFIED) # so use sed let "index += 1" done # Replace hashed links with their original, same as above, only reverse. # One could apply this technique to other tags besides <a>, but you really # want to write that in a proper language :P index=0 NOLINKWIKIPEDIAFIED=$WIKIFIED while [[ "$index" -lt ${#MD5_ARRAY[@]} ]]; do SEARCH=${MD5_ARRAY[$index]} REPLACE=${LINKMATCH_ARRAY[$index]} NOLINKWIKIPEDIAFIED=${NOLINKWIKIPEDIAFIED//$SEARCH/$REPLACE} let "index += 1" done ### Since we have the head and the payload separate, echo both echo $HEAD echo $NOLINKWIKIPEDIAFIED # Reset the IFS, e.g. for following scripts IFS=$OLD_IFS

Bugsquad Talk @ FOSS.in

FOSS.in has finally finished and I really enjoyed being invited. It was a real pleasure having all these talented and energetic hackers around me. It’s definitely on my top-conferences list. You could feel a real hacking spirit and it’s really sad that it’s already over.

The closing ceremony featured TRDP, a really really good Indian band playing fancy music. I was told that they are pretty famous in India and that FOSS.in was lucky to have them there. Hence we were all nerds, a Twitter wall companied the band showing recent tweets concerning the event…

Besides the entertainment, the program itself was pretty good as well. I disliked the keynotes to some extend though. I felt that they were mostly not really relevant to FOSS because the content was obsolete (i.e. one guy basically showing how to do shellscripts) or otherwise out of scope (i.e. a free robot operating system).

I have to thank the organizers of FOSS.in for running that conference and inviting me. Also, I need to thank the GNOME Foundation for subsidizing my trip.

The Bugsquad Talk went pretty well, I’d say. Around 5 people were interested joining the Bugsquad and I hope that they’ll stay around 🙂 Unfortunately, the GNOME project day took place on the last day, making it unattractive to start something new because you can’t ask anyone anymore the next days.

Also, compared to other organisations such as KDE or Fedora, GNOME was highly under-represented. KDE had sweaters to give away. Admittedly, they were not very well designed but hey, it’s sweaters after all! Also, they had very fancy leaflets shortly describing what KDE was, why they rule and how to contribute. Very well done.

Srini brought GNOME T-Shirts which was fine but somewhat boring. Seriously, I have gazillions of T-Shirts and think other people do so, too, as nearly every project or company gives away T-Shirts. So doing something new is a smart thing to do. I hope the GNOME marketing team will come up with something fresh and shiny (hoodies? shoes? underwear? “GNOME” Keys for the keyboard instead of Windows Keys?).

FOSS.in – Impressions

The second day of FOSS.in, Indias largest Free Software conference taking place in Bangalore has just finished and the conference has been very awesome so far. The people are smart, the food rocks and you can feel the hacking spirit everywhere. While the venue itself has a high technical standard, the network over wifi is damn slow. It’s 6kB/s on average so I’m barely able to transfer data.

Since Maemo Bangalore is giving some N900s away if you hack, port or package something awesome, I want to download the SDK. But with the bandwidth contraints, it’s not really possible :-/

Dimitris Keynote on the first day was on how to build a revolutionary free software project. I enjoyed his talk although I did not really get the point. It felt like instructions for a general FLOSS project and not a revolutionary in particular.

Harald Weltes talk on how to Opening Closed Hacker Domains such as DECT or GSM was very exciting and I really look forward to have some time to play around with that. He really enlightened the crowd and showed us why it is important to get FLOSS into those areas which are highly dominated by the proprietary world.

The conference is mostly about getting stuff done as opposed to listen to fancy talks. It’s not that that the talk are not important but that actually doing stuff is as well. Apparently, Indian conferences tend to be rather passive. Anyway, it has been really great so far. If you happen to be around, feel free to join us 🙂

My GNOME bugsquad presentation on Saturday is well prepared but I’m still waiting for feedback of the community.