Packing FSFS Repositories

March 11, 2009 CollabNet VersionOne

Subversion 1.5 introduced that idea of sharding for FSFS-backed repositories. For every commit to a FSFS repository, Subversion creates a single file which describes all the changes in that revision. Prior to 1.5, all of those files were stored in a single directory, which had several drawbacks: incremental backups took a long time, the repository could not be dymnically grown across different filesystems, and some filesystems have degraded performance when the number of directory entries grows too large. With sharding, these revision files were split into separate subdirectories, eliminating a large number of these problems.

Even with sharding, the filesystem still has some inefficiencies. For instance, due to the block size of the underlying filesystem, having many files can still lead to wasted space on disk, especially with many small commits. Subversion can open and read data from many revisions over the course of an operation, and using a large number of files means that Subversion can not exploit various operating system-level caches. Backing up and restoring a repository, although quicker, can still take a long time because of the large number of files spread across the repository.

One of the great ideas that came out of the 2008 Subversion Developers’ Summit was the notion that FSFS filesystems could be packed, that is, all the files in a completed shard could be glued together to create a single monster revision file.  This pack file would save space on disk, give the operating system a chance to do some caching, and generally improve the snappiness of the system.

In order to use FSFS packing, you simply need to ensure that target repository has been upgraded to the latest format, and then pack the repository using svnadmin.  Note that repositories do not automatically pack themselves, so for heavily used repositories, you may want to install a cron job or post commit hook to do the packing.  Users can continue to use the repository while it is being packed:

$ svnadmin upgrade repo
Repository lock acquired.
Please wait; upgrading the repository may take some time...
Upgrade completed.
$ svnadmin pack repo
Packing shard 0...done.
Packing shard 1...done.
Packing shard 2...done.
Packing shard 3...done.
Packing shard 4...done.
...
Packing shard 36...done.
$

To give an idea of the potential space savings, on my local 1.5-era copy of Subversion’s own repository I get the following results:

$ du -sh svnrepo-1.5/
659M	svnrepo-1.5/
$

While on a packed 1.6 copy of the same repository, with rep-sharing enabled, I see the following:

$ du -sh svnrepo-1.6/
593M	svnrepo-1.6/
$

That’s more than a 10% decrease in space, at no cost in performance. These space savings will vary depending upon your own repository and use habits, but we’re excited about the improvements in the FSFS backend in Subversion 1.6.

Previous Article
Everything You Always Wanted to Know About Forge.mil – But Were Afraid to Ask…

The Forge.mil team is very excited about the progress we’ve made in a short time (about two months of ‘Beta...

Next Article
Sparse Directories, Now With Exclusion

“cmpilato ❤ sparse directories” If I had a dollar for every time I’ve typed that… well, you and I could at ...