23.08.2011 bfsync-0.1.0 OR managing big files with git home

I’ve been using git to manage my home dir for a long time now, and I’m very happy with that. My desktop computer, laptop, server,… all share the same files. This solution is great for source code, web pages, contacts, scripts, configuration files and everything else that can be represented as text inside the git repo.

However when it comes to big files, git is a poor solution, and by big files I mean big, like video/music collection, iso images and so on. I tried git-annex which seems to adress the problem, but I never was happy with it.

So I decided to write my own (open source) big files synchronization program called bfsync, and today I made the first public release. It keeps only hashes of the data in a git repository (so the directory is managed by git), and is able to transfer files between checkout copies on different computers as needed.

You can find the first public release of bfsync here.

11 comments ↓

#1 Julian Andres Klode on 08.23.11 at 15:23

=> http://git-annex.branchable.com/

#2 pel on 08.23.11 at 15:54

You mean like a tracker with directories and version control?

#3 pel on 08.23.11 at 15:54

s/tracker/bit torrent tracker/

#4 Nagappan Alagappan on 08.23.11 at 18:42

I think you can use bup – https://github.com/apenwarr/bup

#5 huh on 08.23.11 at 19:33

Nice. Would it be possible have multiple backends? Would be great to have something like this for the subversion…

#6 stw on 08.23.11 at 20:44

@Julian: well, as I said, I tried to use git-annex for some time, but couldn’t get it to work properly – I cannot really say it was git-annex’s fault, maybe I didn’t get how things were supposed to work.

One other point is that for me the fact that git-annex was written in Haskell is a problem rather than a feature, because it means I can’t debug/fix it.

#7 stw on 08.23.11 at 20:50

@pel: Well, the difference is that bittorrent was written for distributing one set of files to a lot of machines. So a lot of the things bittorrent does is how the data is copied and shared. However, the set of files remains static forever, nothing changes once the torrent is on the tracker.

bfsync on the other hand has very little code for really distributing the files, its little more efficient than scp. So if you add more machines, you still do straight copying from one machine to another (unlike bittorrent). However, the set of files can be changed on any machine, and all machines will eventually get the same set of files. This means the advantage of bfsync is that you can add a file on host1, delete a file on host2, rename a file on host3, and in the end all hosts will have the same files.

#8 stw on 08.23.11 at 20:52

@Nagappan: I think I can’t use bup, because it wasn’t designed for distributed editing of the file set on many machines (but I just read the main page, so I may be wrong). Still a nice project, though.

#9 stw on 08.23.11 at 20:57

@huh: Yes, I think there is no feature bfsync requires from the underlying version control system that is git-specific. If its nicely implemented (for instance base class for versioning systems, and derived classes for git and svn), I see no reason why it shouldn’t be included in bfsync. Its not a priority for me though, since I’ve become accustomed to git features, so my own bfsync repos will definitely use git.

#10 skierpage on 08.23.11 at 21:41

How does bfsync handle directory renames, e.g. I rename photos/2011-08-23/ to photos/2011-08-23_Germany_trip/ or do some other folder reorganization on client1; would I use `bfsync mv`? bfsync stores content hashes; can client2 realize “I already have for the bits for update.bin somewhere else as distro_patch_157.bin”? I think big files are heading towards a content-addressable distributed file system overlay with local caching, but those are just buzzwords ๐Ÿ™‚

Some typos:
“its highly recommended” and “its {,not,also} possible” are all contraction of “it is” -> itโ€™s
“Get the file form the server” -> from

#11 stw on 08.24.11 at 09:28

@skierpage: currently, bfsync mv only handles file renames, not directory renames. I’ve put this on the TODO list, it definitely should be supported by bfsync mv. For the time being, you can either bfsync mv all the files or go to the git tree (in .bfsync directory) and manually rename the directory. Once you have the changes on another repo, bfsync will naively try to retransfer the files. However, you can get around this by using bfsync get /path/to/repo on the target machine. That will transfer the files from the local repository (rather than the remote repository), which should be quite fast. But it might be a good idea to do this step automatically, because local transfers are probably always faster than remote transfers.

And yes, as long as you have the right file content on the machine, you do not need to retransfer; bfsync only works on the hash values, so as long as you have a file with the right hash available locally, bfsync will use it as soon as you use bfsync get /some/local/path/containing/file.

Finally, you need to call bfsync delete to remove the old files.

Leave a Comment