ca644 – muellis blog

In a course (CA644) we were asked to add a new syscall to the Linux kernel.

As I believe that knowledge should be as free and as accessible as possible, I thought I have to at least publish our results. Another (though minor) reason is that the society -to some extend- pays for me doing science so I believe that the society deserves to at least see the results.

The need to actually publish that is not very big since a lot of information on how to do that exists already. However, that is mostly outdated. A good article is from macboy but it misses to mention a minor fact: The syscall() function is variadic so that it takes as many arguments as you give it.

So the abstract of the paper, that I’ve written together with Nosmo, reads:

This paper shows how to build a recent Linux kernel from scratch, how to add a new system call to it and how to implement new functionality easily.
The chosen functionality is to retrieve the stack protecting canary so that mitigation of buffer overflow attacks can be circumvented.

And you can download the PDF here.

If it’s not interesting for you content wise, it might be interesting from a technical point of view: The PDF has files attached, so that you don’t need to do the boring stuff yourself but rather save working files and modify them. That is achieved using the embedfile package.

\usepackage{embedfile}        % Provides \embedfile[filename=foo, desc={bar}]{file}
[...]
\embedfile[filespec=writetest.c, mimetype=text/x-c,desc={Program which uses the new systemcall}]{../code/userland/writetest.c}%

If your PDF client doesn’t allow you save the files (Evince does 🙂 ), you might want to use pdftk $PDF unpack_files in some empty directory.

In one of our modules, “System Software”, we were asked to write a bash script which wikifies a page. That means to identify all nouns and replace them with a link to the Wikipedia.

I managed to write that up in two hours or so and I think I have a not so ugly solution (*cough* it’s still bash… *cough*). It has (major) drawbacks though. Valid X(HT)ML, i.e.

<![CDATA[ <body>

before the actual body will be recognized as the beginning. But parsing XML with plain bash is not that easy.

Also, my script somehow does not parse the payload correctly, that is it tails all the way down to the end of the file instead of stopping when </body> is reached.

Anyway, here’s the script:

#!/bin/bash
### A script which tries to Wikipediafy a given HTML page
### That means that every proper noun is linked against the Wikipedia
### but only if it's not already linked against something.
### Assumptions are, that the HTML file has a "<body>" Tag on a seperate
### line and that "<a>" Tags don't span multiple lines.
### Also, Capitalised words in XML Tags are treated as payload words, just
### because parsing XML properly is a matter of 100k Shellscript (consider
### parsing <[DATA[!). Also, this assumption is not too far off, because
### Captd words in XML happen seldomly anyway.
### As this is written in Bash, it is horribly slow. You'd rather want to do
### this in a language that actually understand {X,HT}ML like Python
 
# You might want to change this
BASEURL="http://en.wikipedia.org/wiki/"
 
set -e
 
### Better not change anything below, it might kill kittens
# To break only on newlines, set the IFS to that
OLD_IFS=$IFS
IFS='
'
HTML=$(cat $1) # Read the file for performance reasons
# Find the beginning and end of Document and try to be permissive with HTML
# Errors by only stopping after hitting one <body> Tag
START_BODY=$(grep --max-count=1 -ni '<body>'<<<"$HTML" | cut -d: -f1)
END_BODY=$(grep --max-count=1 -ni '</body>'<<<"$HTML" | cut -d: -f1)
 
HEAD=$(head -n $START_BODY<<<"$HTML") # Extract the Head for later use
# $(()) is most probably a non-portable bashism, so one wants to get rid of that
RANGE_BODY=$(($END_BODY-$START_BODY))
 
# And the extract the body
PAYLOAD=$(tail -n +${START_BODY} <<<"$HTML" | tail -n ${RANGE_BODY})
 
### This is the main part
### Basically search for all words beginning with a capital letter
### and match that. We can use that later with \1.
 
### Try to find already linked words, replace them by their MD5 hash,
### Run generic Word finding mechanism and replace back later
 
# We simply assume that a link doesn't span multiple lines
LINKMATCHES=$(grep -i -E --only-matching '<a .*>.*</a>' $1 || true)
 
LINKMATCH_ARRAY=()
MD5_ARRAY=()
CLEANEDPAYLOAD=$PAYLOAD
if [[ -n $LINKMATCHES ]]; then
    # We have found already linked words, put them into an array
    LINKMATCH_ARRAY=( $LINKMATCHES )
    index=0 # iterate over array
    for MATCH in $LINKMATCHES; do
        # Uniquely hash the found link and replace it, saving it's origin
        MATCHMD5=$(md5sum <<<$MATCH | awk '{print $1}')
        MD5_ARRAY[$index]=$MATCHMD5
        # We simply assume that there's no "," in the match
        # Use Bash internals string replacement facilities
        CLEANEDPAYLOAD=${CLEANEDPAYLOAD//${MATCH}/${MATCHMD5}}
        let "index = $index + 1"
    done
fi
 
 
# Find the matches
WORDMATCHES=$(grep --only-matching '[A-Z][a-z][a-z]*'<<<$CLEANEDPAYLOAD | sort | uniq)
WORDMATCHES_ARRAY=( $WORDMATCHES )
index=0
WIKIFIED=$CLEANEDPAYLOAD
while [[ "$index" -lt ${#WORDMATCHES_ARRAY[@]} ]]; do
    # Yeah, iterating over an array with 300+ entries is fun *sigh*
    # You could now ask Wikipedia and only continue if the page exist
    # if wget -q "${BASEURL}${SEARCH}"; then ...; else ...; fi
    SEARCH="${WORDMATCHES_ARRAY[$index]}"
    REPLACE="<a href=\"${BASEURL}${SEARCH}\">\2</a>"
    # Note, that we replace the first occurence only
    #WIKIFIED=${WIKIFIED/$SEARCH/$REPLACE} ## That's horribly slow, so use sed
    # Watch out for a problem: "<p>King" shall match as well as "King</p>"
    # or "King." but not eBook.
    # We thus match the needle plus the previous/following char,
    # iff it's not [A-Za-z]
    WIKIFIED=$(sed -e "s,\([^A-Za-z]\)\($SEARCH\)\([^A-Za-z]\),\1$REPLACE\3,"<<<$WIKIFIED) # so use sed
    let "index += 1"
done
 
# Replace hashed links with their original, same as above, only reverse.
# One could apply this technique to other tags besides <a>, but you really
# want to write that in a proper language :P
index=0
NOLINKWIKIPEDIAFIED=$WIKIFIED
while [[ "$index" -lt ${#MD5_ARRAY[@]} ]]; do
    SEARCH=${MD5_ARRAY[$index]}
    REPLACE=${LINKMATCH_ARRAY[$index]}
    NOLINKWIKIPEDIAFIED=${NOLINKWIKIPEDIAFIED//$SEARCH/$REPLACE}
    let "index += 1"
done
 
### Since we have the head and the payload separate, echo both
echo $HEAD
echo $NOLINKWIKIPEDIAFIED
 
# Reset the IFS, e.g. for following scripts
IFS=$OLD_IFS

#!/bin/bash ### A script which tries to Wikipediafy a given HTML page ### That means that every proper noun is linked against the Wikipedia ### but only if it's not already linked against something. ### Assumptions are, that the HTML file has a "<body>" Tag on a seperate ### line and that "<a>" Tags don't span multiple lines. ### Also, Capitalised words in XML Tags are treated as payload words, just ### because parsing XML properly is a matter of 100k Shellscript (consider ### parsing <[DATA[!). Also, this assumption is not too far off, because ### Captd words in XML happen seldomly anyway. ### As this is written in Bash, it is horribly slow. You'd rather want to do ### this in a language that actually understand {X,HT}ML like Python # You might want to change this BASEURL="http://en.wikipedia.org/wiki/" set -e ### Better not change anything below, it might kill kittens # To break only on newlines, set the IFS to that OLD_IFS=$IFS IFS=' ' HTML=$(cat $1) # Read the file for performance reasons # Find the beginning and end of Document and try to be permissive with HTML # Errors by only stopping after hitting one <body> Tag START_BODY=$(grep --max-count=1 -ni '<body>'<<<"$HTML" | cut -d: -f1) END_BODY=$(grep --max-count=1 -ni '</body>'<<<"$HTML" | cut -d: -f1) HEAD=$(head -n $START_BODY<<<"$HTML") # Extract the Head for later use # $(()) is most probably a non-portable bashism, so one wants to get rid of that RANGE_BODY=$(($END_BODY-$START_BODY)) # And the extract the body PAYLOAD=$(tail -n +${START_BODY} <<<"$HTML" | tail -n ${RANGE_BODY}) ### This is the main part ### Basically search for all words beginning with a capital letter ### and match that. We can use that later with \1. ### Try to find already linked words, replace them by their MD5 hash, ### Run generic Word finding mechanism and replace back later # We simply assume that a link doesn't span multiple lines LINKMATCHES=$(grep -i -E --only-matching '<a .*>.*</a>' $1 || true) LINKMATCH_ARRAY=() MD5_ARRAY=() CLEANEDPAYLOAD=$PAYLOAD if [[ -n $LINKMATCHES ]]; then # We have found already linked words, put them into an array LINKMATCH_ARRAY=( $LINKMATCHES ) index=0 # iterate over array for MATCH in $LINKMATCHES; do # Uniquely hash the found link and replace it, saving it's origin MATCHMD5=$(md5sum <<<$MATCH | awk '{print $1}') MD5_ARRAY[$index]=$MATCHMD5 # We simply assume that there's no "," in the match # Use Bash internals string replacement facilities CLEANEDPAYLOAD=${CLEANEDPAYLOAD//${MATCH}/${MATCHMD5}} let "index = $index + 1" done fi # Find the matches WORDMATCHES=$(grep --only-matching '[A-Z][a-z][a-z]*'<<<$CLEANEDPAYLOAD | sort | uniq) WORDMATCHES_ARRAY=( $WORDMATCHES ) index=0 WIKIFIED=$CLEANEDPAYLOAD while [[ "$index" -lt ${#WORDMATCHES_ARRAY[@]} ]]; do # Yeah, iterating over an array with 300+ entries is fun *sigh* # You could now ask Wikipedia and only continue if the page exist # if wget -q "${BASEURL}${SEARCH}"; then ...; else ...; fi SEARCH="${WORDMATCHES_ARRAY[$index]}" REPLACE="<a href=\"${BASEURL}${SEARCH}\">\2</a>" # Note, that we replace the first occurence only #WIKIFIED=${WIKIFIED/$SEARCH/$REPLACE} ## That's horribly slow, so use sed # Watch out for a problem: "<p>King" shall match as well as "King</p>" # or "King." but not eBook. # We thus match the needle plus the previous/following char, # iff it's not [A-Za-z] WIKIFIED=$(sed -e "s,$[^A-Za-z]$$$SEARCH$$[^A-Za-z]$,\1$REPLACE\3,"<<<$WIKIFIED) # so use sed let "index += 1" done # Replace hashed links with their original, same as above, only reverse. # One could apply this technique to other tags besides <a>, but you really # want to write that in a proper language :P index=0 NOLINKWIKIPEDIAFIED=$WIKIFIED while [[ "$index" -lt ${#MD5_ARRAY[@]} ]]; do SEARCH=${MD5_ARRAY[$index]} REPLACE=${LINKMATCH_ARRAY[$index]} NOLINKWIKIPEDIAFIED=${NOLINKWIKIPEDIAFIED//$SEARCH/$REPLACE} let "index += 1" done ### Since we have the head and the payload separate, echo both echo $HEAD echo $NOLINKWIKIPEDIAFIED # Reset the IFS, e.g. for following scripts IFS=$OLD_IFS

Tag: ca644

Adding Linux Syscall

Wikify Pages