Git clones vs Shallow Git clones
April 18, 2009
When cloning a Git repository, there is an option to limit the amount of history your clone will have. If you set the parameter to –depth 1, you get the least amount of history, and you create a shallow clone.
The git clone man page says that you cannot push your commits if you have a shallow clone. Apparently, there is no error message when you actually push your commits, so it is a situation that might bring problems in the repository in the future.
Lacking more details on whether pushing commits from shallow clones is bad for the repository, let’s measure if there are any gains when someone opts for shallow clones.
Module (gnome-2-26) | Full clone (MB) | Shallow clone (MB) |
evolution | 204 | 189 |
gtk+ | 193 | 172 |
nautilus | 139 | 108 |
gnome-games | 127 | 120 |
gnome-applets | 110 | 98 |
gnome-user-docs | 108 | 102 |
evolution-data-server | 84 | 77 |
anjuta | 76 | 66 |
libgweather | 69 | 68 |
gnome-panel | 68 | 60 |
ekiga | 61 | 49 |
dasher | 58 | 49 |
orca | 55 | 47 |
gnome-utils | 53 | 48 |
gnome-icon-theme | 51 | 49 |
gedit | 49 | 45 |
epiphany | 48 | 42 |
gnome-control-center | 46 | 40 |
gdm | 43 | 38 |
glib | 42 | 37 |
gnome-system-tools | 33 | 29 |
gnome-media | 33 | 30 |
totem | 31 | 27 |
gnome-power-manager | 31 | 27 |
gnome-backgrounds | 31 | 30 |
brasero | 31 | 29 |
metacity | 29 | 27 |
gnome-desktop | 28 | 24 |
tomboy | 27 | 25 |
seahorse | 24 | 22 |
gnome-terminal | 23 | 21 |
gnome-session | 23 | 20 |
gucharmap | 22 | 19 |
gnome-vfs | 22 | 19 |
glade3 | 21 | 19 |
gconf | 21 | 20 |
eog | 21 | 18 |
gcalctool | 19 | 17 |
libgnomeui | 18 | 15 |
gtkhtml | 18 | 16 |
evince | 18 | 15 |
gnome-themes | 17 | 16 |
cheese | 17 | 15 |
file-roller | 16 | 14 |
empathy | 16 | 15 |
gok | 14 | 13 |
gtksourceview | 13 | 12 |
gnome-keyring | 13 | 12 |
gnome-doc-utils | 13 | 13 |
bug-buddy | 13 | 11 |
zenity | 12 | 11 |
yelp | 12 | 11 |
sound-juicer | 12 | 11 |
libgnome | 12 | 11 |
gvfs | 12 | 9.9 |
gnome-system-monitor | 12 | 11 |
deskbar-applet | 12 | 9.5 |
libbonobo | 11 | 8.8 |
gnome-settings-daemon | 11 | 11 |
gnome-devel-docs | 11 | 11 |
evolution-exchange | 9.9 | 9.3 |
gnome-screensaver | 9 | 8.3 |
vte | 8.7 | 7.5 |
libbonoboui | 8.7 | 7.4 |
libgtop | 8.4 | 6.9 |
libgnomeprintui | 8.4 | 7.1 |
gconf-editor | 8.4 | 7.9 |
libgnomeprint | 8.1 | 7 |
vinagre | 7.3 | 6 |
libwnck | 6.6 | 5.9 |
accerciser | 6.6 | 6.3 |
gtk-engines | 6.4 | 5.4 |
sabayon | 5.8 | 5.2 |
vino | 5.7 | 5.3 |
gnome-nettool | 5.3 | 4.9 |
mousetweaks | 5 | 4.7 |
totem-pl-parser | 4.6 | 4.5 |
at-spi | 4.5 | 3.9 |
libgnomecanvas | 4.3 | 3.7 |
atk | 4.2 | 3.7 |
gnome-netstatus | 4.1 | 3.8 |
devhelp | 3.9 | 3.2 |
gdl | 3.5 | 3.2 |
gnome-mag | 3.2 | 2.9 |
gnome-menus | 3 | 2.6 |
hamster-applet | 2.8 | 2.2 |
gnome-user-share | 2.6 | 2.5 |
evolution-mapi | 2.2 | 2.1 |
libgnomekbd | 1.8 | 1.7 |
alacarte | 1.6 | 1.4 |
pessulus | 1.5 | 1.3 |
evolution-webcal | 1.4 | 1.3 |
swfdec-gnome | 1.1 | 0.94 |
Total (MB) | 2625.6 | 2349.24 |
Time (min) | 52 | 37 |
The git repositories for all modules of gnome-2-26 weight 2.6GB while their shallow clones are 2.3GB. There is a difference of less than 300MB.
Comparatively, if it takes 52 minutes to clone all GNOME 2.26 repositories, their shallow clones save 15 minutes.The speed that was reported by git clone was about 1.4MB/s in this experiment.
Cloning is bound by both your bandwidth and your CPU (especially when resolving deltas). It would be interesting to evaluate if there would be benefits (on git.gnome.org load, speed of cloning) by having daily tarballs of anonymous clones of the modules, so that one can download using HTTP and then simply add their account details and update with git pull –rebase.
With the above information, it makes sense to avoid making shallow clones, especially when you intend to push your changes. Instead, one would dedicate at least 2.6GB for the repositories, and keep them.
intltool-manage-vcs was used to retrieve the repositories.
Update: The GNOME 2.26 modules (2.6GB in size for all their repositories), compresses down to 1.6GB (.tar.bz2).
April 18, 2009 at 7:05 am
Yes, daily tarballs of git clones would be nice. You could also make them bare repositories (so they are smaller) and add a README.GNOME.git file to make sure people do git checkout master in them.
April 18, 2009 at 10:35 pm
Time (min) 52:19.49s 37:41.48s
Time in (min*s)? Does that make it super-time or something? Nice post for the rest of it.
April 19, 2009 at 3:24 am
@Michael: The system that I was running the test on has a very fast connection to the Internet, between 20-30Mbps.
For a typical case with slow home broadband, the speed will definitely slower. It should take about 3-5 hours.
April 19, 2009 at 7:37 am
Maybe I wasn’t clear enough… The unit of time is either minutes, or seconds, not (minutes * seconds), because that would make it (time ^ 2).
April 19, 2009 at 11:27 am
@Michael: Ok, now I see. It was a copy-paste typo.
April 20, 2009 at 6:05 pm
If you’re getting that much compression from bzip2, then it sounds like your repositories aren’t packed very efficiently. Are you tarring only the .git directory, or the .git directory and working copy? The latter is pointless, as the .git directory contains all the information needed to restore the working copy (provided you have no uncommitted changes). Running “git reset –hard” will do this.
April 20, 2009 at 7:21 pm
@Steven: Indeed, I have been tarring the working copy as well.
I just measured the size of the .git/ directories only, they are 1.124 GB
Thanks for the tip, it will be very helpful.
May 6, 2009 at 4:52 pm
quite informative. so apparently nothing to be gained much with shallow repo.
thanks