This post is a sort of summary of the work we’ve done on the Addressbook for Evolution Data Server this year at Openismus. As I demonstrated in my previous post, Evolution’s addressbook is now riddled with rich locale sensitive features so I won’t cover the sorting features, you can refer to the other post for that.
Understanding Phone Numbers
This is another really nice feature which I admit has been driving me up the wall for the last year. I’ll try to sum it up one last time here so hopefully I don’t have to ever concern myself too much with the topic again
Before I get into it, I should start with describing the problem which we’ve addressed (I can’t say solved as I don’t believe there really is any ultimate solution for this problem).
So, this problem is particular to the implementation of a hand phone device, which implies the following conditions:
- Users will enter any kind of data into the addressbook and call it a phone number. This most typically involves phone numbers entered as local numbers, sometimes it includes fully qualified international numbers for contacts who live abroad, and can also include text such as “ask jenny for her number next time”, as a rule, anything goes.
- Addressbooks are typically a collection of vCards, this is a point of interest as using a standard format for contacts allows one to send vCards back and forth, which means that you cannot consider yourself in control of the data you store on your device. Instead a vCard can come from a synchronization of contacts from your laptop’s email client, or passed over bluetooth, the vCard can come from anywhere, containing any data it pleases and calling that a phone number.
- When initiating an outbound call, the cellular modem firmware will send a string of numbers to the carrier, this will either succeed or fail. It’s tempting at this stage to consider and store the result of this operation, but unclear if the modem firmware will tell you the fully qualified number of the successful outbound call, and unreasonable to attempt a call just to determine such a thing.
- When the firmware announces an incoming call, there will be a fully qualified E.164 phone number available (E.164 is a standard international phone number format).
Two obvious use cases now present a difficulty:
- Let’s say the user starts entering a phone number somewhere in the UI, perhaps with the intent of initiating a phone call, or just with the intent of searching for a contact. We know that the user entered ‘just about anything’ as a phone number, we know that the vCard might have come from an external source (the same user might not have entered the phone number to begin with), and of course, we want to find the correct contact, and we want to find that contact right now.
- Let’s imagine also, just for a moment, that your hand phone implementation might actually receive a phone call (not all of your users are so popular that they actually receive calls, but this is, after all, what your device is for ;-)). So now we have access to a fully qualified E.164 number for the incoming call, and an addressbook full of these vCards, which have, ya know, whatever the user decided to enter as a phone number. We of course want to find the single unique matching contact for that incoming phone number, and we want it even more righter and nower than in the other use case listed above (just in case, you know designers, maybe they want something crazy like displaying the full name of the contact who’s calling you instead of the phone number).
A common trick of the trade to handle these cases is to perform a simple heuristic which involves normalizing the number in your vCards (to strip out spaces and characters such as ‘-’ and ‘.’) and then perform a simple suffix match against the incoming caller’s fully qualified phone number. This was admittedly a motivation behind our implementation of optimized suffix matching last year.
But let’s pretend that you’re not satisfied with this heuristic, you know you can do better, and you want to impress the crowd. Well now you can do just that ! assuming of course that you use Evolution’s addressbook in your device architecture (but of course you do, what else would you use ?).
Locale sensitive phone number interpretations
Again the International Components for Unicode (ICU) libraries come to the rescue, this time with the libphonenumber helper library on google code, which is now an optional dependency for Evolution Data Server (my painful experience compiling this heap of libphonenumber code begs me to make an unrelated comment: somebody school these guys in the ways of Automake, this CMake experience is just about the worst offence you can make to a downstream packager, or someone like me, who just needs to build it).
So what is a locale sensitive interpretation of a phone number ? Why do we need that ?
- Some countries have different call prefixes for outgoing calls to leave the country. I believe this is ’0′ in North America but it can vary. Even more confusing, take a country like Brazil with many competing cellular carriers, in the case of Brazil you have multiple valid calling prefixes to place a call outside of the country, each one with a different, competing rate.
- Some locales, like in all locale sensitive affairs, have different preferences about how a phone number is composed, i.e. do we use a space or a dash or a decimal point as a separator ? do we interpret a trailing number preceded by a decimal as an extension code ? or as the final component of a local phone number ?
- Some locales permit entirely different character sets to be used for numbers, and users might very well prefer to enter phone numbers in their native language script rather than entering standard numeric characters.
Using libphonenumber allows us to make the best possible guess of what the fully qualified E.164 number would be for an arbitrary string entered by the user in a given locale. It also provides us with useful information such as whether the phone number contained an extension code, whether the number had a leading 0 (special case for Italy) and whether the parsed phone number string had a country code. Actually libphonenumber will also make a guess at what the country code would be, but we’re not interested by that feature.
To leverage this feature in Evolution’s addressbook, one should make use of the following APIs:
- The function e_phone_number_is_supported() can be called at runtime to determine if your installation of Evolution Data Server was compiled with support for phone numbers (as libphonenumber is only a soft dependency, this should be checked).
- When creating a new addressbook, the ESourceBackendSummarySetup should be used to configure the E_CONTACT_TEL field into the addressbook ‘quick search’ configuration, and it should be configured with the special index E_BOOK_INDEX_PHONE so that interpreted phone numbers will be stored in SQLite for quick phone number matching.
- Whether your addressbook is configured for quick phone number searches or not, so long as phone numbers are supported in your build then you have the capacity of leveraging the phone number matching semantics in the regular way with EBookQuery.
Three queries exist which can be used for phone number matching, at differing match strengths.
A typical query that you would use to search for, say equality at the national number strength level, would look like this:
EBookQuery *query = e_book_query_field_test (E_CONTACT_TEL, E_BOOK_QUERY_EQUALS_NATIONAL_PHONE_NUMBER, "user entered, or E.164 number");
With the above query one can use the regular APIs to fetch a contact list, filter a view, or filter a cursor.
Addressbooks on Crystal Meth
One thing that is really important of course in contacts databases, is speed, lots and lots of speed
While last year we’ve made considerable improvements, and already gained much speed for the addressbook, now contacts are inserted, deleted, modified and as specially fetched righter and nower than ever.
I should start by mentioning that in the last two weeks I’ve committed a complete rewrite of the old SQLite code which handles addressbooks, finally overcoming this bug, and turning what used to look like this (shudders) into something much more intelligible (sigh of relief). The net results in performance, can be observed here.
While I’d love to go over the vast improvements I’ve made in the code, and new features added, let’s just go through some highlights of the new benchmarks as I’ve been writing this post for a few hours already
First some facts about the addressbook configuration, since that makes a great difference in the meaning of the benchmark output.
- E_CONTACT_GIVEN_NAME – Forced fallback search routines (no index)
- E_CONTACT_FAMILY_NAME – Prefix Index, Suffix Index
- E_CONTACT_EMAIL – Prefix Index, Suffix Index
- E_CONTACT_TEL – Prefix Index, Suffix Index
The red line in the benchmarks is what is now in EDS master (although marked as “Experimental” in the charts). The comparisons are based on addressbooks opened in Direct Read Access mode, i.e. books are opened with e_book_client_connect_direct().
And now some of the highlights:
In the above chart we’re basically seeing the time we save by using prepared statements for batch inserts, instead of generating the same query again and again.
Prefix searches on contact fields which have multiple values (stored in a separate table) such as email addresses and phone numbers, now have good performance for addressbooks containing over 200,000 contacts (I was unable to test 409,600 contacts, as the benchmarks themselves require a lot of memory to keep a set of vCards and EContacts in memory).
Here we see no noticeable change in the speed of fetching contacts with the fallback routine (i.e. no data to compare in SQLite columns, vCard parsing involved).
This is interesting only because in my new code base, we no longer fetch all contacts into memory and compare the list one by one, but instead install a function into SQLite to perform the fallback routine over the course of a full table scan. This is much more friendly to your memory consumption, and comes at no decrease in the performance of queries which hit the fallback routine.
Here we see the memory usage of the entire benchmark process including the evolution-addressbook-factory process over the course of the benchmarks (each dot from left to right is a measurement taken after each benchmark runs). Note that we have 200,000 contacts and vCards loaded into memory for the sake of the benchmarks to verify results for each speed test (hence the initial spikes at the beginning of the benchmark progress).
The noticeable drop in memory usage can be attributed to how the new backend is more friendly on system memory when performing fallback search routines.
Take a look at the full benchmarks including the csv file here. I should note that the ‘fetch-all-contacts.png’ benchmark indicated a bug where I was not detecting a special case (a wildcard query which should simply list all results), that has been fixed and the benchmark for it doesn’t apply with current Evolution Data Server master.
Anyway, it’s late and time for pizza
I hope you’ve all enjoyed the show !