Setup S3 buckets with Cross Region Replication

September 12, 2018 Eryk Szymanski

This is the second blog post in our series related to the configuring replication with Git LFS using Amazon S3 Bucket. In the first blog we have discussed what are the implications of using Git LFS together with replication and how to improve it. Before we begin I would like to thank my colleague Logarajan. Logarajan has helped me a lot and co-authored this post.

This time, we are going to set up the Git LFS data replication. We will use S3 Cross Region Replication (CRR) feature to replicate the objects from one S3 bucket to another. That way the data can be used by Gerrit LFS implementation. To find more about CRR please follow this link.

Our setup

In order to configure CRR we will need two buckets in two different AWS Regions. In our example we will use the source bucket named gerrit-lfs-master located in region us-west-2, US West (Oregon) and the destination bucket named gerrit-lfs-replica located in region ap-south-1, Asia Pacific (Mumbai). We want all the objects that are uploaded to the source to be automatically replicated to the destination.

Notes:

  • CRR is enabled at the bucket level. You can request S3 to replicate all objects or subset of objects using prefixes (folders). We are going to replicate all objects.
  • Although there is an option to encrypt the objects using AWS KMS, we are not going to use it for this setup. By default, the CRR uses secure socket layer (SSL) in transit to transfer the objects between the S3 buckets.

Cost considerations

Before we start let’s talk a bit about the cost considerations. I don’t want to go into details here, but just want to point out that the cost is an important factor to consider while evaluating this setup. One thing to understand is the impact of versioning, as there will be multiple copies of the same object maintained. Versioning means there will be one copy for each version of the same object. The total cost will include storage costs, with additional storage cost for each version of the same object that gets uploaded, data transfer, and number of requests (PUT/GET/DELETE). As you probably already know, the storage cost is dependent on the storage class used. We will use Amazon S3 Standard storage class. You can find more about the storage classes here.

Security

Another important thing to discuss is the security. As per AWS Security best practices, one should not store any credentials/keys on an AWS instance. However, the Gerrit LFS plugin requires the Access Key and Secret Access Key, to access the objects from buckets. This means that we have to use AWS accounts with the necessary privileges.

I assume here that the buckets have already been created. However, we need to configure them to make replication work. Let’s start with the configuration now. The first step is to enable versioning.

Enabling versioning

I keep it short here as it is very well documented. First thing to do is to enable versioning on both source and destination buckets. The easiest way of doing it is to sign in into AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/. Then, choose the source bucket: gerrit-lfs-master, go to Properties tab, choose Versioning and select Enable versioning radio button. Don’t forget to save your changes. After the configuration is saved, follow the same procedure for the destination bucket.

Enabling replication

Again, I will keep it short. For more details please refer to the documentation.

In Amazon S3 console, select the source bucket: gerrit-lfs-master, go to Management tab, choose Replication, and then choose Add rule. We have 4 steps to go through:

  1. Source
  2. Destination
  3. Permissions
  4. Review

Let’s start at step 1.Source. We want to replicate the whole bucket, so we leave the default settings for the source: All contents. Now, we change the Status to Enabled. Note, that we leave the option Replicate objects encrypted with AWS KMS unchecked. After that we click on Next to go to point 2.Destination. Now we need to select the Destination bucket name: gerrit-lfs-replica. We leave all the options unchecked and click on Next again. Now we reach the point 3.Permissions. Here we have to select the IAM Role. We choose the option Create new role and click Next again. We end up at point 4.Review where we see overview of our configuration and we click Save button to finalize the setup.

AWS account with access to the source bucket

Now, we are going to create an AWS account, that permits only programmatic calls to AWS, with the necessary privileges for the source bucket.

First, we create an IAM Policy which provides the necessary privileges for manipulating the contents on the source bucket. What is IAM Policy? To quote IAM Policies page from AWS docs:

A policy is an object in AWS that, when associated with an entity or resource, defines their permissions. AWS evaluates these policies when a principal, such as a user, makes a request. Permissions in the policies determine whether the request is allowed or denied. Most policies are stored in AWS as JSON documents.

To create IAM Policy follow this procedure:

  1. Open the IAM console at https://console.aws.amazon.com/iam/
  2. Choose Policies in the navigation panel on the left
  3. Choose Create policy
  4. Select JSON
  5. Replace the contents with the policy below
  6. Click Review Policy
  7. Provide policy name and description
  8. Click Create Policy

This is documented here.

Below you can see the policy for the source bucket: gerrit-lfs-master:

  { 
    "Version": "2012-10-17",
    "Statement": [
    {
       "Sid": "VisualEditor0",
       "Effect": "Allow",
       "Action": [ "s3:ListAllMyBuckets", "s3:HeadBucket" ],
       "Resource": "*"
    },
    {
       "Sid": "VisualEditor1",
       "Effect": "Allow",
       "Action": "s3:*",
       "Resource": [ "arn:aws:s3:::gerrit-lfs-master", "arn:aws:s3:::gerrit-lfs-master/*" ]
    }
    ]
  }

Now we have to create a user that has privileges defined above.

Create an IAM user with the required privileges

This procedure is documented here.

  1. Open the IAM console at https://console.aws.amazon.com/iam/
  2. Choose: Users and then: Add Users
  3. Type the user name for the new user: gerrit-master
  4. Select type of access: Programmatic Access
  5. Go to Next: Permissions
  6. Choose: Attach existing policies to user directly
  7. Select the policy created in the previous step
  8. Choose: Next: Review to review the user
  9. Choose: Create User
  10. Click Download .csv

The downloaded .csv file contains the Access key and Secret Access key which we will use to configure Gerrit LFS plugin on the master AWS instance.

AWS account with access to the destination bucket

Now it is time to follow the same procedure for the destination bucket. We will create an AWS account which permits only programmatic calls to AWS with the necessary privileges for the replication bucket.

To create IAM Policy follow this procedure:

  1. Open the IAM console at https://console.aws.amazon.com/iam/
  2. Choose Policies in the navigation panel on the left.
  3. Choose Create policy
  4. Select JSON
  5. Replace the contents with the policy below
  6. Click Review Policy
  7. Provide policy name and description
  8. Click Create Policy

This time our policy looks like this:

{
  "Version": "2012-10-17",
  "Statement": [
  {
    "Sid": "VisualEditor0",
    "Effect": "Allow",
    "Action": [
    "s3:ListAllMyBuckets",
    "s3:HeadBucket"
    ],
    "Resource": "*"
  },
  {
    "Sid": "VisualEditor1",
    "Effect": "Allow",
    "Action": [
      "s3:GetLifecycleConfiguration",
      "s3:ListBucketByTags",
      "s3:GetBucketTagging",
      "s3:GetInventoryConfiguration",
      "s3:GetObjectVersionTagging",
      "s3:GetBucketLogging",
      "s3:ListBucketVersions",
      "s3:GetAccelerateConfiguration",
      "s3:ListBucket",
      "s3:GetBucketPolicy",
      "s3:GetEncryptionConfiguration",
      "s3:GetObjectAcl",
      "s3:GetObjectVersionTorrent",
      "s3:GetBucketRequestPayment",
      "s3:GetObjectVersionAcl",
      "s3:GetObjectTagging",
      "s3:GetMetricsConfiguration",
      "s3:GetIpConfiguration",
      "s3:ListBucketMultipartUploads",
      "s3:GetBucketWebsite",
      "s3:GetBucketVersioning",
      "s3:GetBucketAcl",
      "s3:GetBucketNotification",
      "s3:GetReplicationConfiguration",
      "s3:ListMultipartUploadParts",
      "s3:GetObject",
      "s3:GetObjectTorrent",
      "s3:GetBucketCORS",
      "s3:GetAnalyticsConfiguration",
      "s3:GetObjectVersionForReplication",
      "s3:GetBucketLocation",
      "s3:GetObjectVersion"
    ],
    "Resource": [ 
      "arn:aws:s3:::gerrit-lfs-replica", "arn:aws:s3:::gerrit-lfs-replica/*" 
    ]
  }
  ]
}

Now we will create another user, this time with the privileges defined above.

Create an IAM user with the above privileges

  1. Open the IAM console at https://console.aws.amazon.com/iam/
  2. Choose: Users and then: Add Users
  3. Type the user name for the new user: gerrit-master
  4. Select type of access: Programmatic Access
  5. Go to Next: Permissions
  6. Choose:Attach existing policies to user directly
  7. Select the policy created in the previous step
  8. Choose: Next: Review to review the user
  9. Choose: Create User
  10. Click Download .csv
  11. Choose: Close

The downloaded .csv file contains the Access key and Secret Access key which we will use to configure Gerrit LFS plugin on the replica AWS instance.

We have configured S3 bucket replication successfully together with required access. Moreover, we have necessary Access Keys, and Secret Access Keys, that are needed to configure Gerrit LFS plugin. Hence, we are done with this part. Next step is to configure Gerrit to use the S3 bucket as LFS storage which is described in the next blog post.

About the Author

Eryk Szymanski

Eryk is CollabNet’s Development Manager leading Git and Gerrit related development efforts. He has over 20 years of engineering and management experience ranging from start-ups to medium-size enterprises. Eryk holds Master degree in Computer Science and is Certified Scrum Master.

More Content by Eryk Szymanski
Previous Article
Asking the Experts: What’s the Best Way to Evangelize DevOps Internally?

As we gear up for our fall events, including the DevOps Enterprise Summit October 22-24, we have been talki...

Next Article
TeamForge Replication with Git LFS S3 backend
TeamForge Replication with Git LFS S3 backend

Let’s talk a bit about Git LFS (Large File Storage) in context of Git replication with TeamForge. What is G...