Can history be changed… and why should I care?
In Git version control system it is possible to change history (e.g. to remove accidentally added files which are big/confidential/infringing), or completely delete a branch without a trace (e.g. remove already merged or abandoned feature branches) . This can be useful, but may be dangerous.
What if someone does that by accident or with malicious intent? In corporate settings, surprises are usually not a good thing, especially when it comes to data loss/corruption.
Here are example commands which change history in Git:
git push -f git push origin :branch git push origin branch :branch2 # note the space before colon
There are ways to keep “history of history”, thus reducing or eliminating the mentioned risks. I will compare those ways for you.
What is Git reflog?
Git reflog is a mechanism built into git, which can be used to determine e.g. what commit was at the head of a branch “2 days ago”. That information is stored in a set of log files under .git/logs.
What is History Protection in Git/Gerrit Integration with CollabNet TeamForge?
In this article I will shorten “Git/Gerrit integration with CollabNet TeamForge” as TeamForge-Git, for the sake of brevity.
In TeamForge-Git, a new feature was recently introduced – History Protection. It offers out-of-the-box, enterprise-ready, site-wide-enforceable protection from Bad Things(TM) happening to the all-important Git history and Git refs.
How do those two approaches compare?
I will describe the differences between the two approaches. In case you want a quick overview – there is a tabular comparison further down this article.
At a shallow glance they might seem similar, but when one looks deeper, one notices, that it is more like apples-to-oranges; or apple-to-blackberry ;).
Purpose and origin
Some people describe reflog as more of a personal tool for a developer to have a fallback when something goes wrong, e.g. when a local branch is deleted accidentally.
In contrast, TeamForge-Git’s History Protection was designed from the beginning as a mechanism that must work in a multi-user enterprise environment on a blessed repository and be remotely accessible and manageable through a Web UI as well as command line clients.
Accessibility, UI, Tooling
To configure and access Git reflog, one needs to have access to file system on the server where the “blessed Git repository” is hosted, which is unlikely in big organizations and would keep the server administrators busy.
TeamForge-Git’s History Protection has a web UI, which offers a “self –service” approach. Users with appropriate permissions in TeamForge/Gerrit can, by themselves, inspect, resurrect and permanently delete the rewritten/deleted Git refs, thus eliminating the need to involve server administrators who have filesystem access. Additionally, History Protection creates special (backup) Git refs for ordinary refs which get deleted/rewritten. Those special refs can be inspected/resurrected also using ordinary Git clients.
What information is stored
Information stored by Git reflog
Git reflog records these changes in the repository:
- Any push
- Any merge
- Any branch creation/deletion
- Any tag creation/deletion
Example entries in reflog look like that:
<sha1> <sha1> kpradzinski <firstname.lastname@example.org> 1354558003 +0100 push <sha1> <sha1> kpradzinski <email@example.com> 1354558072 +0100 push
Two issues baffled me:
- The second entry actually represents forced push (non-fast-forward). I haven’t found any way to distinguish fast-forward from non-fast-forward. This seems to make reflog ill-suited (maybe even useless) for the purposes of protecting history in an air-tight manner.
- A non-human-readable timestamp format was chosen, which will complicate things for the human user.
Information stored by TeamForge-Git’s History Protection
If TeamForge-Git’s History Protection is on, then whenever a Git ref gets deleted/rewritten, a descriptive entry is added to the list of deleted/rewritten refs. This is somewhat similar to “Recycle bin” on computer desktop. Gerrit Administrators can permanently delete those entries or “resurrect” them.
Here is how it looks:
Additionally, TeamForge-Git’s History Protection creates Audit Log entries whenever git refs get
- re-written (non-fast-forward)
- permanently deleted (which is somewhat similar to “emptying the Recycle bin” on computer desktops).
As far as I know, git reflog offers no possibility of notifying users about history rewrites or deleted branches.
TeamForge-Git’s History Protection has a built-in notification mechanism, enabled by default. It uses Gerrit’s e-mail sending infrastructure to notify administrators about history rewrites and deleted git refs.
Protection against object pruning and reflog expiration
Git has a mechanism called “garbage collection”, which is used to permanently remove data that is no longer referenced. If this is triggered at an unfortunate time, then some unreferenced (e.g. due to delete/rewrite) commits will get lost.
Reflog expiration and gc pruning settings have to be manually configured by server administrator. The only possibility to not lose commits no longer referenced in a branch is to set both values to “unlimited” which will consume large amounts of disk space, slow down garbage collection and will cut off the option of permanently deleting specific commits (all or nothing).
TeamForge-Git History Protection
With TeamForge-Git History Protection, commits remain referenced from special Git refs in “recycle bin”, thus will never be pruned by Git garbage collection, unless explicitly permanently removed using Gerrit Web UI by one of the administrators.
Additionally, no need to keep a large ref log.
Ease of use
It is only manually configurable by administrator having file system access. It has to be configured for each and every repository. Restoring deleted/rewritten refs requires running git commands directly on the server.
TeamForge-Git’s History Protection
It is available out-of-the-box in TeamForge-Git. History protection is enforceable per-repository or for all repositories by setting a site-wide config option. Users with appropriate permissions can restore refs using Gerrit Web UI or Git client. Users with appropriate permissions can use ordinary git clients (e.g. ls-remote, Eclipse) to access (read-only, e.g. to inspect) rewritten/deleted refs which are visible to them under special ref directories refs/delete and refs/rewrite.
Here is how those refs look like in ls-remote:
[ /tmp/demo-project-1/ ] %: git ls-remote From ssh://dariusz@example:29418/demo-project-1 4f00518c5a4a9d8f6f4a3cfb019ba518fb89e6a8 HEAD f459ce2b20ef60f404da3f383b0e2a28831e2418 refs/delete/20121130230538-this_branch_is_just_resting--dariusz 4f00518c5a4a9d8f6f4a3cfb019ba518fb89e6a8 refs/heads/master 329bb817c024fcb80c6f4f301b97d9a1985ee2e0 refs/rewrite/20121130230705-master-4f00518c5a4a9d8f6f4a3cfb019ba518fb89e6a8-dariusz
To protect or not to protect…
Having a leaky or nonexistent protection against losing history may, in the milder case, cause some anxiousness for administrator(s), but in more severe cases it may cause real loss or corruption of data, leading to serious problems.
I have to also point out that it is possible to use both approaches at the same time.
Here is the promised table that gives a more compact comparison:
TeamForge Git Integration with “History Protection”
|Accessibility||Requires direct access to file system on server where ‘blessed Git repository’ is hosted which is very unlikely in huge organizations and will keep the server administrators busy||‘Self –Service’ approach. Users with appropriate permissions in TeamForge can find out/resurrect deleted/rewritten branches by themselves, decreasing work load of server administrators.
Gerrit Administrators can also permanently delete selected branches/tags.
|Signal-to- Noise Ratio||reflog records all changes in the repository
Finding out about history rewrites/deleted branches is like searching for a needle in a haystack
|History Protect has a view which neatly shows:
Separately – Audit Log entries whenever branches/tags get
|Notification||No notification||Email to Gerrit Administrators|
|Ease of use||Only manually configurable by an administrator having file system access||Out-of-the-box in TeamForge-Git|
|To be configured for every repository||Configurable per-repository or for all repositories via a site-wide config option.|
|Restoring requires running git commands on server||Users with appropriate permission can restore history using Gerrit web UI or Git client|
|Protection against object pruning/reflog expiration||RefLog expiration and gc pruning settings have to be manually configured by server administrator. Only possibility to not lose commits no longer referenced in a branch is to set both values to “unlimited”, which will consume big amounts of disk space, slows down garbage collection and does not allow to permanently delete specific commits (all or nothing).||Preserved commits will never be pruned by garbage collection, unless permanently removed using Gerrit Web UI.
No need to keep a large ref log. Garbage collection will run faster since all commits are still referenced in the repo.
You came, you saw, you should comment
Please share your feedback – how do you like this comparison, did I miss something, what were your experiences with reflog, or other things that may have crossed your mind.