How to waste an evening debugging the internals of glib’s qsort implementation

Working on a personal itch-scratching project, I found myself wanting a sortable treeview. So I stuffed my treemodel into a TreeModelSort and set up my sort functions, and everything was happy. Except, when running my application, I kept getting strange segfaults within glib’s qsort implementation. I had set the following function as my sort_func (yes, it’s C++, but it should be fairly self-explanatory):

static int
compare_foo (const Gtk::TreeModel::iterator& a, const Gtk::TreeModel::iterator& b)
{
    std::tr1::shared_ptr<Foo> l1, l2;
    l1 = a->get_value (COLUMN_FOO);
    l2 = b->get_value (COLUMN_FOO);

    if (!l1)
        return -1;

    if (!l2)
        return 1;

    return l1->get_id () - l2->get_id ();
}

Can you spot the error? Running it under valgrind indicated that I was reading data from *before* the start of the array that was being sorted. In fact, the following lines of code in g_qsort_with_data() were causing tmp_ptr to be decreased back past the start of the array:

        while ((*compare_func) ((void *) run_ptr, (void *) tmp_ptr, user_data) < 0)
          tmp_ptr -= size;

Hmm. Why is there no guard to prevent tmp_ptr from being decreased past the start? Then I noticed the following comment up a few lines:

    /* Find smallest element in first threshold and place it at the
       array's beginning.  This is the smallest array element,
       and the operation speeds up insertion sort's inner loop. */

So at this point, the code is assuming that the very first element of the array is the smallest, so compare_func should always return >= 0 when it reaches the start of the array. Aha! So the problem is my compare_func. It turns out I forgot to handle the single case of both values being NULL.

    l1 = a->get_value (COLUMN_FOO);
    l2 = b->get_value (COLUMN_FOO);
+
+   if (!l1 && !l2)
+       return 0;

    if (!l1)
        return -1;

So there’s not really a big lesson to be learned here, but if you ever hit strange segfaults deep inside glib’s qsort while sorting a treemodel, do yourself a favor and double-check that your sort function is sane first. Also, do yourself a favor and run it under valgrind as soon as you get a strange segfault. Knowing the exact point of the invalid read is infinitely more helpful than waiting until that invalid read causes a segfault.

7 thoughts on “How to waste an evening debugging the internals of glib’s qsort implementation”

  1. When you hit this issue once, you never forget again. In general you MUST ensure the following properties on your cmp function:

    1) cmp(a,b)==-cmp(b,a)
    2) cmp(a,a)==0
    3) cmp(a,b)==0 && cmp(b,c)==0 => cmp(a,c)==0
    4) cmp(a,b)<0 && cmp(b,c) cmp(a,c)<0

    behdad

Leave a Reply

Your email address will not be published. Required fields are marked *