I must admit that I’m new to blogging and these lines have been written on plain old paper first – yes, I was a COBOL developer in the early days of my professional career and I still write a lot of things in upper case.
When I made the move to a small start-up named Atria Software back in 1996, the world of software development looked quiet different: small to mid-sized teams, typically co-located around a specially designated project server. Today’s broadband connections are faster than most intranet lines then, and different sites are connected through GB-capacity leased lines? Back then, if you were lucky you had a doubled ISDN dial-up connection.
Back in these days companies started to ramp-up distributed teams to work on highly complex projects, and of course they needed an infrastructure that supported this. The answer of most commercial SCM systems was Replication and Synchronization of the repositories. You duplicate the archive, figure out the delta at the different sites from time to time, you exchange these deltas and apply it to the different replicas – done! The biggest advantage of this approach was that it worked with the thin connections. The biggest disadvantage: conflicts due to the time gap between change and sync. If the same object is changed at 2 sites, a potential conflict has to be resolved during the synchronization. You might argue that this still can be solved by mastership concepts to avoid that the same object is changed simultaneously. You are right, but this implies that you are introducing branches not because the project needs them, but just because of the replication.
Modern approaches try to solve this challenge like Raid systems with write-through technology. If you commit a change to an archive, the change is propagated immediately to the other replicas, without any mastership definition. This little time gap eliminates most of the risk – almost like a central archive!
But – just "almost". First, if the connection between the replicas is down the risk for conflict rises again. Second, it reduces the flexibility of organizations to react to change, because the setup of the infrastructure reflects the status quo (if you’ve ever moved replicas between different sites you know what I mean …). Third, it makes IP (Intellectual Property) Governance more complicated. I know of a ClearCase VOB that was replicated across 28(!) sites, most of them with read-only access. Imagine the cost to establish the same level of security to avoid unauthorized access at each site?
A common argument against a centralized approach in SCM is what to do if you (or your site) are offline and you need data that is not in your working copy right now. Well, in today’s world there is no longer the need for being offline (you can even go online in an airplane these days); the technology is there and it is way cheaper than bothering a large developer team with a highly complex branching strategy. And if you want to isolate work for a while before sharing it with the other project team members, Subversion (or an embedded Subversion) allows you to do that as well.
Don’t get me wrong, there are still some scenarios for replication, like an extreme data volume. Then, and only then, it is the right approach. But don’t let it be your silver bullet to meet the challenge of distributed development and do it because you always did it!
I must admit I have my fair share in seeding the concept of replication into organizations. But – replication as an SCM concept to support distributed teams is more than 15 years old; in our market space this is almost 2 generations of technology. It is time to try something new, although it is not that new at all because the mainframers have done it for least for 5 generations …
This is posted by an excited NKOTB (New Kid on the Blog) coming from the pre-internet generation.