I am working on a backup software, and the idea was to store all informations about each file in an sqlite db. So there basically is a “file” table that contains things like file size and sha1 hash (and other information). This works nicely as long as there are 1 million files. Insert speed is acceptable, querys are reasonably fast.
However, if I am assuming that someone will backup the contents of a 2 TB disk, and each file is about 20 kb, there would be over 100 million entries in the file table. I did some tests, and the problem is that sqlite gets slower and slower as the number of files (table entries) grow, to the point where its absolutely unusable. Of course if we’re conservative and assume that each table entry is 100 bytes including all index and internal information, we’re talking about a 10 GB database, which on most systems will neither be cached completely by the kernel, nor fit into the available memory.
Is there any open source alternative (I’d prefer a serverless solution, like sqlite, because its easier to setup), that can handle huge tables like the one I need? Preferably with C or C++ and Python language bindings. I’d also use a non-sql solution, I don’t use many sql features anyway.