PostgreSQL Warm Fail Over using Write Ahead Logs

April 20, 2011 david thomas

PostgreSQLA typical fail over method for any application is to have two identical machines with all data stored on a shared SAN. This falls short on ensuring the integrity of the database. Do we know that the database was properly shutdown on the primary node before failing over? If not, data loss can occurs and the only recovery method will be restoring from backup. This method also introduces a single point of failure for the database in the shared storage.

Luckily, postgres provides a transaction logging method known as write ahead logs (WAL). We can exploit these feature to create a live backup of the production system on the fail over node. We’ll do this by having the primary node copy it’s wal logs to the secondary node. The secondary node will be in continuous recovery mode, reading the wal logs and applying them to it’s database.


Before we begin, a few requirements:

  • Both machines must be the same postgresql version. This document assumes postgresql 8.x. Different solutions are available for postgresql 7.x and 9.x.
  • Both machines must be the same architecture (64 or 32 bit).
  • The primary node must be able to access the secondary node via ssh.
  • Install postgresql-contrib on secondary machine

Configure the primary node

  • Generate ssh key and place public key in authorized_hosts on secondary
    # ssh-keygen -t dsa
    # ssh-copy-id -i ~/.ssh/ postgres@secondary
  • Enable WAL logging by editing postgresql.conf and setting the following values.

archive_mode = on
archive_command = 'rsync --delete-after -a %p postgres@secondary:/var/lib/pgsql/walfiles/%f'

Configure the secondary node

  • Create a directory for the WAL files
    # mkdir /var/lib/pgsql/walfiles
  • Create /var/lib/pgsql/recovery.conf with the following contents:

restore_command = '/usr/bin/pg_standby -l -d -s 2 -t /tmp/pgsql.trigger.5432 /var/lib/pgsql/walfiles %f %p %r 2>>standby.log'

Initialize the cluster

  • On the primary run the following:
    Please note that ‘dhreplication’ is an arbitrary tag, and can be set to anything useful to your configuration.
    # psql -U postgres -c “SELECT pg_start_backup(‘dhreplication’);”
    # rsync -avz /var/lib/pgsql/data/* secondary:/var/lib/pgsql/data/
    # psql -U postgres -c “SELECT pg_stop_backup();”
  • On secondary run the following:
    Edit /var/lib/pgsql/data/postgresql.conf and set ‘archive_mode = off’
    # ln -s /var/lib/pgsql/recovery.conf /var/lib/pgsql/data/recovery.conf
    A symlink is used above, as /var/lib/pgsql/recovery.conf will be removed when recovery is disabled.
    # service postgresql start
  • Monitor for problems by watching /var/pgsql/pgsql.log and /var/lib/pgsql/data/standby.log on secondary

Testing fail over

  • On primary edit /var/lib/pgsql/data/postgresql.conf and set “archive_command = /bin/true” then reload postgresql to make the changes active.
    # service postgresql reload
  • On secondary
    # tail /var/pgsql/pgsql.log /var/lib/pgsql/data/standby.log
    # touch /tmp/pgsql.trigger.5432
  • You may now connect clients to the secondary server.

To resume running on primary

  • On secondary
    # psql -U postgres -c “SELECT pg_start_backup(‘dhreplication’);”
    # rsync -avz /var/lib/pgsql/data/* primary:/var/lib/pgsql/data/
    # psql -U postgres -c “SELECT pg_stop_backup();”
  • Preform steps above to initialize the secondary server again.

External sources

Previous Article
iPad Overtakes Linux in Browser Usage
iPad Overtakes Linux in Browser Usage

Interesting that the iPad has now overtaken Linux in US browser usage. Being as CollabNet has strong ties t...

Next Article
Refactor Your PMP – Part 6: Quality Management
Refactor Your PMP – Part 6: Quality Management

Okay… it has been a while since my last installment in this series. Aside from my general inability to stay...