Camel DB (Disk) Summary – Evolution memory improvements/thoughts

Finally I got my chance to use some of my ITO time (just 3 days this time). I decided to spend my time towards answering memory issues of Evolution Mail. Folder summary (Message list) is one of the biggest reason for Evolution’s high memory usage. The folder summary has Message infos. Every message info is nothing but headers like from/to/cc/sent-received date etc. Notzed (One of the Ex-Evolution/Camel Hacker) wrote a design/code on addressing the core issues. But unfortunately when he left Novell, the code/design wasn’t developed. It used libdb and had lots of new design/ways to access mail data. I was thinking on the same lines and decided to take some concepts from there and wrote a new design/thought . I had spent my 3 days and nights on improving my design and I have a fully working prototype. I used sqlite as the database for Camel Summary and message UID as the primary key. All over the summary code, I used sql queries like “select * from Inbox where uid=’342′” to access the data.

I modified the entire design of Evolution/Camel to store use only UID where ever required. When ever the Message info is required it queries the DB and gets the the data and frees when not required. It could mean that we don’t need to keep the visited folders in memory for the sake of trash/junk. We don’t need to keep the vfolder’s sub folder in memory. It could mean that just the viewed folder (why this also?) can be in memory rest could be just in data base and queried as and when required. I made a prototype with this in mind and I was able to achieve what I thought. (Asking when I’m gonna commit this? Hmm, I have made a prototype. Folder Summary listing works, Junk/Trash works, Search works, VFolders works. But there are lot of things that I can/ need to optimize since I have the flexibility with DB. Since this will break ABI and add more APIs and deprecate a few, I need to design the APIs with lots of things in mind like (remote view, Mails part of EDS, etc). I was discussing with Fejj (another Mail/Camel hacker) on friday and he gave nice thoughts/inputs on my design, like having a LRU implentation to decide what message infos to keep in memory and what not (He gave the code for LRU from GMime). These optimization would reduce huge memory for users having lots of vfolders and folders. Unfortunately this may not have any effect on users lying around just one folder (Just Inbox) and huge mails in it.And after all this optimization of memory, there isn’t be any performance drop, infact, it is a bit faster now with indexed tables. But vfolders was a bit slow, but having persistent summary for vfolders, it is going to be faster than the current implementation (I haven’t prototyped this though). Achieving all this in a cleaner way will be my first target milestone and might take as close as a month. (If I’m allowed to work on this, to go full stretch on this for a month).

In current Evolution, you wont see that memory drop
Note: The drop in memory, is something you won’t see in the current Evolution. Junk/Trash keeps those last visited foders in memory.

Of course there are next levels to this. Remote view & Search-in-disk

Remote View: Currently after my first target, only the viewed folder’s message list is going to be in memory. Now we can have a custom model store, that just maps viewed message list’s infos and may be a buffer of 50/100 above and below the message lists view. We can have a cursor in the db that just moves maps the view+head/tail to memory. It means that when you have a folder of 100,000 mails and the message list shows 50 mails and the head/tail buffer is 50, you would have just 150 mail’s message info on the memory and nothing else. It may be a bit slow, if you do page down/up faster or scroll using mouse. But can make Evolution run on any machine with low memory or mobile devices (Nokia 800/810 etc… Of course you can do a optimized design to over come the performance issue with huge scroll. (Sqlite is pretty fast and possible that you may not notice it most of the times). Also it requires lots of things like sorting/threading etc needs to be built inside the tables and the cursor needs to be mapped to the message list/etree. There is no prototype/data for this, but this is possible for sure after my first milestone. This will be the second mile stone

Search-in-disk: Currently for search, the entire folder summary is brought in memory (anyways in my first milestone only this is in memory). But if we implement a remote view, it may not be so efficient to it this way. We can extend the search to be done inside the data base and just retrieve the uids or use the cursor to map the contents to the message list. Effect: It will be super fast and again on low memory consumption.

I’m not doing much for the second or the third milestone right now. But I want to work on the first milestone for Evolution 2.23.1/2 (Sorry not for GNOME 2.22, too late to bring such a huge design change) The second and the third mile stone might not bring much ABI/API changes if it is designed well in the first mile stone and can be taken/done at any point with out much disturbances to the stability IMO. I wish I w(c)ould on all this.

13 Comments

  1. Sean
    Posted January 15, 2008 at 12:21 am | Permalink

    Out of curiosity, have much will this effort overlap with Tinymail? Other than API/ABI breakages, is there anything in Tinymail’s design changes that make it unsuitable for Evolution?

  2. Posted January 15, 2008 at 12:24 am | Permalink

    Sean: probably a better question to Philip – iirc, tinymail is going another road.

  3. Posted January 15, 2008 at 12:40 am | Permalink

    Andre is right. IIUC Philip is with mmap and trying for cursor based thing there. No effort overlap really.

  4. Posted January 15, 2008 at 12:44 am | Permalink

    That depends of course. If srag’s implementation is good, I might consider switching to his work.

    Right now it’s unfinished and experimental. So I’m waiting for the results before making a decision.

    Tinymail’s summary implementation achieves most of the benefits already, but that doesn’t mean that it was implemented in an ideal way.

    Perhaps this is indeed better and then there will be a lot of overlap.

    I’m working on a few of my own experiments that don’t change the way things work as much as this idea, maybe those experiments will be better.

    Who knows? In any case, people are working on Camel. That by itself is overlap with Tinymail already.

  5. Posted January 15, 2008 at 12:51 am | Permalink

    Philip. If Im not stopped for a month or two, I would do everything and I’m hoping for the best to happen in Camel and there by in Evolution and all its dependent projects.

  6. Posted January 15, 2008 at 12:56 am | Permalink

    Using Sqlite is a brilliant idea.

    Nat has been in love with Sqlite for a while as a replacement for the db-like APIs.

    Miguel.

  7. Posted January 15, 2008 at 1:01 am | Permalink

    I still hope that at some point we can make Camel a separate project. That or manage to convince Jeffrey to continue development libspruce.

    I would love to have an excellent replacement for that part of Tinymail that is now camel-lite. I’ve always seen camel-lite as a temporary fork, although right now the changes that have been made are all needed and have gotten quite large in their nature.

    So I fear it’ll take quite some time before upstream Camel can do all those things (in some pluggable way).

    However. Again, if your summary implementation indeed works well (if you implement it right, it will. Because conceptually it can definitely work very well), then you’ll see it being used in Tinymail too. Probably in a 2.0 version.

    Because if it does work well, I don’t see any reason why I should hold to my current mmap-based implementation (which was a clever hack, but still a hack).

  8. Posted January 15, 2008 at 1:30 am | Permalink

    Thanks Miguel.

    Philip, as I have said before I’m not against it as a separate project. But it it is reaaally required, we can do it. But that shouldn’t be the stopping point for anything. If I implement it well and it works fine. It can make it into tinymail and well, I won’t mind to have it as a separate project then. I see that now tinymail is used in things like modest etc. which may be benefited as well. So we can discuss this may be a little later.

  9. Posted January 15, 2008 at 2:14 am | Permalink

    It’s not about blocking things, really. It’s more about fostering contributions. As a standalone library it might attract more E-mail client developers to start using it.

    While integrated with Evolution, this is unlikely.

    For me it was for example not possible to depend on the entire evolution-data-server stack just for the camel pieces.

    Although sure, a lot of E-mail clients will consume the other pieces of evolution-data-server too.

    That by itself is not a reason to bundle it. There are equally easy E-mail clients needed that don’t need any of evolution-data-server’s functionality. For those E-mail clients, it’s just more difficult to cope with having to put all those eds pieces on their flash disk.

    Camel, however, is at this moment not in a very good shape in my opinion. But if more people would work on it, it could improve a lot. I think.

    But anyway. Good luck with the experiments. You’ll need it :)

  10. Dave Richards
    Posted January 15, 2008 at 3:00 am | Permalink

    srag: Remember to keep in mind machines with 200 or 300 concurrent users. :) If you create code that moves more to the disk, maybe you can write a way to simulate a lot of those instances running at the same time for testing purposes.

    At 8am in the morning, I literally have 100+ people logging in and starting email within a 10 minute window. In theory, the current cache code should simulate something similar to this design, right? Hit me up anytime if I can assist.

  11. Posted January 15, 2008 at 3:11 am | Permalink

    Dave sure. I’m really not looking at more disk access. But an optimized access. I wish, I can have a tuner implemented on how to tune your memory/disk usage of Evolution. Should be simple though. The tuner might just cache or extend the window of the LRU based cache.

  12. William Lovaton
    Posted January 15, 2008 at 6:14 pm | Permalink

    This is truly awesome, congratulations.

    I really depend on Evolution for may day to day activities and it would be really nice if it uses less memory to let my crappy system run faster with some other memory hungry applications

    Keep up the good work, thanks a lot.

    Cheers.

  13. Posted January 16, 2008 at 8:55 pm | Permalink

    Whoa!!! At last some take over the problem of memory usage on Evolution. I was thinking on a SQL based email client since my first use of Pine, and I used Gmail (not google’s one but http://ftp.cdut.edu.cn/pub/linux/network/client/email/gmail/gmail_linuxpower_org.html) and it was blazingly fast.

    What would be amazing is the ability to use a remote database, for example postgres or mysql.

    That will allow people to use a central database server for email. Backup will be easier, working from different locations will be easier. On some environments, using NFS to share the homes, SQLlite could be problematic, as is very dependent on locking and NFS is very prone to problems with locks.

    Setting a database server on a office lan is much more easy and doable for not very high skilled people. A setup like David’s one for Fargo city are out of question for many small offices, but a central database of emails is very doable.

    Please, consider database abstraction during the design & implementation (if possible). Being able to use different databases will be amazing.

    Thanks!