How to Setup a Replica for PostgreSQL 9.x
The following steps I use to setup a new Postgresql 9.5 replica.
Why do I need them? The PostgreSQL docs are excellent in providing both an understanding of the topic of replication, as well as giving details on what to do. Still, I find myself asking: Great, I get it. What now?
I've used these steps to setup PostgreSQL 9.3 and 9.5 replicas on Ubuntu 16.04 (Xenial Xerus). They should work for the whole 9.x series, at least >9.3, though. Make sure to adapt the filepaths, if you use a version of Postgres that is different from 9.3.
Please don't just use the steps without understanding them. :) Also, use sudo or be root where needed.
Steps to set up the replica
- Setup the master for replication.
- Note down the IP of the master server, the user name and the password for the replication user. In my case the user is called replication.
Stop Postgres (as root):
service postgresql stop
Switch to postgres user:
su - postgres
Create an empty data directory:
# This uses the Ubuntu default path cd /var/lib/postgresql/9.3/ # Backup the old data-dir # (Make sure there's nothing there.) mv main main-backup # Create a new, empty data-dir. mkdir main chmod 700 main
Create a recovery.conf (make sure to replace the placeholders for server IP, replication user and password and the port, if necessary:
cd main echo "standby_mode='on' primary_conninfo='host=MASTER-IP port=5432 user=replication password=REPLICATION-PASSWORD sslmode=require' trigger_file='/tmp/pg_trigger.repl'" >recovery.conf
Draw a base backup (make sure to replace the placeholders for server IP and replication user, and get ready to enter the replication user's password):
pg_basebackup -h MASTER-IP -D /var/lib/postgresql/9.3/main -U replication -v -P
Switch back to root and start PostgreSQL:
exit service postgresql start
Steps to check the replication
On the replica server, check that the WAL receiver is running:
ps auxf | grep -v grep | grep "postgres: wal receiver process"
This should show a postgres: wal receiver process process, for example:
postgres 7839 0.0 0.0 323052 11116 ? Ss 06:31 0:00 \_ postgres: wal receiver process streaming 79/DF47B6C0
If it does not show anything, check the log file:
Check the replication delay on the replica:
su - postgres psql -c 'select now() - pg_last_xact_replay_timestamp() AS replication_delay;' | tail -n 3 | head -n 1
It should show the number of seconds that the replica is behind the master in replicating:
A few seconds are normal. This is the data you'd loose, if the master server gets shutdown abruptly. I've seen this number vary between 0 and 15. The only thing that matters is that it stays low. If the delay is ever-growing, your replica is not keeping up with the master and you need to investigate this, possible getting a faster machine or enlarging the network bandwidth.
Also, you may see an error like this:
psql: FATAL: the database system is starting up
In that case you may not have enabled hot_standby mode on the replica. This setting allows you to switch the replica live at a later point, i.e. it becomes a master server. For our use-case, it allows us to run queries, for example to determine the replication delay, which is otherwise not possible.
To enable hot_standby, open /etc/postgresql/9.3/main/postgresql.conf, search for hot_standby and replace its value off to on:
hot_standby = on # "on" allows queries during recovery # (change requires restart)
Check the replication status on the master server, by running the following SQL statement:
select client_addr, state, sent_location, write_location, flush_location, replay_location from pg_stat_replication;
It should give you an output like so:
client_addr | state | sent_location | write_location | flush_location | replay_location --------------+-----------+---------------+----------------+----------------+----------------- 192.168.178.5 | streaming | 79/DF6CECF8 | 79/DF6CECF8 | 79/DF6CECF8 | 79/DF6CECF8 (1 row)
If the IP under client_addr matches the IP of the replica's machine, all is well.