Handling large binary files with Git is a performance pain. You can work around the problem by some proper Gerrit tuning and restructuring your build scripts so that they fetch binaries from an artifact repository instead of having them part of the repo. With Git LFS, there is another approach available that does not require any changes to your build process or Git server configuration.
The Gerrit and JGit communities are still working on built-in Git LFS support, but I thought it makes sense to show how Gerrit can be used with a separate Git LFS backend – Artifactory – right now. I was first concerned that a separate Git LFS backend would require every end user to explicitly point to a different URL but as my example will show, that is fortunately not the case. So without further ado, let’s jump into the example.
Example – Versioning large Prezi files
My team loves to illustrate ideas we get while talking to all kind of folks using Prezi, an awesome presentation software. Using Prezi we can capture ideas very nicely, here is an example of a presentation on how one could motivate users to do more code reviews as developed in a workshop from PO DOJO in betahaus Berlin:
While Prezi is awesome, it does not have a versioning feature built in and its files are pretty big (the Prezi in question is 120 MB), so a perfect example for versioning with Git LFS. If you follow the next steps, you should be able to have your own Gerrit / Artifactory / LFS setup running in less than 20 minutes.
Step 1: Installing Git LFS extensions
Git LFS is not yet a built-in part of Git, so you have to download the extensions from GitHub. All major versions of Linux, Windows and Mac are supported. Once you have finished and executed the download, all you have to do is to type git lfs install into your Git shell to complete the installation:
$ git lfs install Git LFS initialized.
This is also the only step all users of your repository would have to do to use Git LFS.
Step 2: Cloning from Gerrit and tracking your large files
For our example, we assume that the Prezi files should be part of a repository called ShinyApp, so let’s clone this repository from Gerrit:
$ git clone ssh://email@example.com:29418/shinyapp && cd "shinyapp" Cloning into 'shinyapp'... remote: Counting objects: 2, done remote: Finding sources: 100% (2/2) remote: Total 2 (delta 0), reused 0 (delta 0) Receiving objects: 100% (2/2), 238 bytes | 0 bytes/s, done.
Let’s assume our Prezi is called incentivize_code_review.zip and has not been added to the index yet. Before we actually do that, we should tell Git LFS that this is one of the large binary files that should be treated specially:
$ git lfs track incentivize_code_review.zip Tracking incentivize_code_review.zip
If you wanted to treat all zip files as large binaries, you could also type
git lfs track *.zip
(we don’t do this as part of this example)
It is important that you track binary files BEFORE you add them to the index as otherwise the staged file will still be stored in Git’s native database.
If you type git status, you will notice that .gitattributes has changed as well:
$ git status On branch master Your branch is up-to-date with 'origin/master'. Untracked files: (use "git add <file>..." to include in what will be committed) .gitattributes incentivize_code_review.zip
It contains all files that are tracked by Git LFS and should be added to the index as well:
$ git add .gitattributes incentivize_code_review.zip warning: LF will be replaced by CRLF in .gitattributes. The file will have its original line endings in your working directory.
(You can safely ignore any warning about line ending replacements.)
Before we can craft the commit and push it to Gerrit, we have to tell it where the Git LFS backend is. This step has to be performed only once per repository.
Step 3: Setting up a Git LFS repository in artifactory
If you do not have an artifactory install yet, you can set up a free trial on JFrog’s web site within less than 3 minutes. Git LFS support for Artifactory is currently only available as part of the Pro and Cloud version.
Once you created an Artifactory admin account and logged into Artifactory, it should look like this:
Git LFS should show as available, otherwise you are probably using an older Artifactory version or not the Pro/Cloud version. Next, you would navigate to Admin -> Repositories -> Local from the left side bar and create a new local repository:
In the subsequent dialog, select Git LFS as package type and decide on a repository key for your repository. I used ShinyApp in this example:
Once you clicked Save & Finish, use the left side bar to navigate back to Main -> Artifacts, find your newly created repo and click on it:
The last action we have to do in Artifactory is to click on the highlighted Set Me Up button so that the following dialog appears:
Artifactory suggests to put the highlighted snippet into a file called .gitconfig inside your Git repository. Starting from LFS 1.1, this file naming convention is actually obsolete, so let’s put this snippet into the file .lfsconfig instead.
Step 4: Pointing Gerrit to Artifactory and pushing the commit
Let’s create .lfsconfig in our ShinyApp repository and paste the two suggested lines in:
git config -e -f .lfsconfig <paste content and save>
Now, anybody who clones or pushes to the repository will know where files stored via Git LFS will live. We have to add .lfsconfig to the index and craft a commit:
$ git add .lfsconfig $ git commit -m "Added info about LFS backend and first large binary file"
Let’s examine the commit:
$ git show HEAD commit 185d2713b6a6146a36b6e192e3f1b7166de022f9 Author: Johannes Nicolai <firstname.lastname@example.org> Date: Tue Jan 26 17:44:25 2016 +0100 Added info about LFS backend and first large binary file Change-Id: I8bacfabe74ec75a14617288ba71f8023ae4f5e8d diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..a8646ff --- /dev/null +++ b/.gitattributes @@ -0,0 +1 @@ +incentivize_code_review.zip filter=lfs diff=lfs merge=lfs -text diff --git a/.lfsconfig b/.lfsconfig new file mode 100644 index 0000000..5507816 --- /dev/null +++ b/.lfsconfig @@ -0,0 +1,2 @@ +[lfs] +url = "https://gerrit.artifactoryonline.com/gerrit/api/lfs/ShinyApp" diff --git a/incentivize_code_review.zip b/incentivize_code_review.zip new file mode 100644 index 0000000..e7668b3 --- /dev/null +++ b/incentivize_code_review.zip @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:44a63a694240bd7834c9dd0d83c48a2989ec0117762a5e9bb11886a98398f419 +size 125839268
You can see in blue that .gitattributes contains the tracked file name, in green that .lfsconfig contains the information for Git LFS clients where to retrieve and store binaries and in red that the 120 MB zip file is actually not stored in the Git repository itself but just a pointer to it (sha256).
Finally, let’s push the commit to Gerrit:
$ git push origin HEAD:master Username for 'https://gerrit.artifactoryonline.com': jonico Password for 'https://email@example.com': Git LFS: (1 of 1 files) 120.01 MB / 120.01 MB Counting objects: 6, done. Delta compression using up to 8 threads. Compressing objects: 100% (5/5), done. Writing objects: 100% (5/5), 664 bytes | 0 bytes/s, done. Total 5 (delta 0), reused 0 (delta 0) remote: Processing changes: refs: 1, done To ssh://firstname.lastname@example.org:29418/shinyapp 35c8eee..185d271 master -> master
During the push, you will be asked for your Artifactory credentials. Git LFS works well with the Git credential helper if you do not like to enter your password all the time. If we refresh our repository in Artifactory, we can see that our Prezi just arrived there with the same sha256 as the one referenced in the Git commit:
Step 5: Cloning from a different host/user
If a team member now wants to access our Prezi, all they have to do is to install the Git LFS extensions as shown in step 1 and clone the repository (credentials required). Because of the .lfsconfig file being present, they do not have to know anything about the artifactory URL.
$ git clone https://email@example.com/gerrit/shinyapp && cd "shinyapp" Cloning into 'shinyapp'... remote: Counting objects: 7, done remote: Finding sources: 100% (7/7) remote: Total 7 (delta 0), reused 5 (delta 0) Unpacking objects: 100% (7/7), done. Downloading incentivize_code_review.zip (120.01 MB) Username for 'https://gerrit.artifactoryonline.com': potsdam Password for 'https://firstname.lastname@example.org': % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 9 100 9 0 0 9 0 0:00:01 --:--:-- 0:00:01 9
If we look at the file size of our Prezi, it was automatically replaced by Git LFS with the one stored in Artifactory:
$ du -h incentivize_code_review.zip 121M incentivize_code_review.zip
That concludes our mini example, hopefully you can modify it to your needs.
Advanced topics: Access right management and ssh support
The example above has not covered access right and user management within Artifactory. Artifactory has the ability to control read/write/administer access to its repositories on individual user and user group basis. Users can be synched with your corporate LDAP or SAML provider as well. Some companies using Gerrit we talked to also allow anonymous access to stored Git LFS content as long as the developer is within the company network. You might want to be careful with that option though if you are not hosting artifactory yourself and will be charged per GB transferred. The screenshot below shows how this feature can be turned on in Artifactory:
My example was using the https protocol while interacting with Artifactory. It is also possible to use ssh for that if you host your own server or have a dedicated server hosted at JFrog (IOW it is not supported in their Cloud version). More details on access right / user setup and the use of the ssh protocol can be found here.
Last but not least I like to thank Sebastian Schuberth (for the .lfsconfig hint), Ilmari Kontulainen and my colleagues at CollabNet for the nice discussions that helped shaping this blog post. As Ilmari pointed out, there are also other Git LFS backend available that could be used together with Gerrit, but Artifactory made a very good impression, both from its usability and functionality as well as from the responsiveness of their support folks (kudos to Mor from JFrog).