Setup PostgreSQL 14 Streaming Replication
This is a straightforward guide to setting up PostgreSQL version 14 Replication, trying to be as platform-agnostic as possible. So it assumes a working knowledge of using your operating system tools to edit files and install/configure software packages. Also, it assume familiarity with PostgreSQL configurations.
PostgreSQL’s built-in streaming replication creates a cluster of servers for your database. The primary server is the sole read/write database. It streams updated data (the WAL or “write ahead logs”) out to one or more standby servers, which can operate in read-only mode for scaling database operations, and can be promoted to be the new primary node in the case the original primary node fails or goes offline. (Historically, there were called master and slave nodes, but we have better names now.)
Since streaming replication is low-level data, you must have the same version of PostgreSQL on each node in your cluster. If you want to replicate between versions, you will need “logical replication” instead. More on this later.
Create a primary and as many standby servers as you need. A server node can be anything from bare-metal servers, virtual cloud servers, virtual machines, containers or jails. Ensure they are provisioned with ample CPU cores, RAM, and storage devices for your needs. I’ll name the primary db01 and standby servers db02 through dbNN.
You’ll also need a client computer: your workstation, application server, etc. to access your database.
Setup Primary (Read/Write) Server
Now, if this is a new installation, we need to create the data directory. Skip this step otherwise as it will destroy your data! Also, some package installers will have already done this for you.
initdb -E UTF-8 /var/db/postgres/data14
Add and adjust these settings to that file to enable replication.
wal_level = replica
max_wal_senders = 10
wal_keep_size = '1GB'
wal_compression = on
Add a replication user with this command (or with the analogous SQL statement):
createuser -U postgres --replication repl
The “pg_hba.conf” file controls access to your server. Add appropiate rules to it for access from your client machine, and for replication connections from the standby servers. The additional records could be the following. You should change the IP addresses and authentication method for you environment.
host all all 192.168.1.1/24 trust
host replication repl 192.168.1.xxx/30 trust
Start postgres and test that you can access it from your client machine using psql. Then we create the replication slot for each standby server using SQL.
psql -h db01 -U postgres template1
template1=# select * from pg_create_physical_replication_slot('db02_repl_slot');
template1=# select slot_name, slot_type, active, wal_status from pg_replication_slots;
Setup Standby/Replica Servers
It is likely you don’t want any activity on the primary server (db01) while we set up the replicas.
Install PostgreSQL et al on db02, but do not run “initdb”. In fact, if the postgres/data directory has anything in it from the install, it needs to be moved/removed. The “pg_basebackup” command won’t run unless it looks like there is no data in that directory.
We need to sync the postgres/data directory. The PostgreSQL way is to run pg_basebackup which syncs it over the replication slot. Here is a basic command to run on each standby server, modify for your system.
sudo -u postgres pg_basebackup --pgdata /var/db/postgres/data14 --format=p
--write-recovery-conf --checkpoint=fast --label=mffb --progress
--host=db01 --port=5432 --username=repl
When complete, it copies over the complete PGDATA directory from the primary, including configuration files. It also creates a “standby.signal” file in the replica’s PGDATA directory to signal to PostgreSQL this is a standby server. Add the following to the “postgresql.conf” file on the replica, adjusting naming for your server.
primary_conninfo = 'user=repl port=5432 host=db01 application_name=db02.repl'
primary_slot_name = 'db02_repl_slot'
That’s it! Now start the postgres server.
Instead of pg_basebackup, you can copy the files from the primary server to the replicas using things like ZFS snapshots, rsync, etc. Make sure permissions are identical as well afterwards. Manually create the “standby.signal” file in the PGDATA directory and apply the above configuration. That should be good to go!
Try it out
Let’s create user (role), database, table and row in the primary and see it in the standby/replica.
createuser -h db01 -U postgres -d testuser
createdb -h db01 -U testuser testdb
psql -h db01 -U testuser testdb
testdb=# create table test_table (id int);
testdb=# insert into test_table values (1);
testdb=# select * from test_table;
Now let’s connect to our replica and try.
psql -h db02 -U testuser testdb
testdb=# select * from test_table;
You should see the row “1” you created in db01. Since the replica is read-only, we can only select data from it, no creates, inserts, updates, deletes, etc.
psql -h db02 -U testuser testdb
testdb=# insert into test_table values (2);
ERROR: cannot execute INSERT in a read-only transaction
DNS and Failover
As an application or user, how do we know which is the current primary or replica servers? Things change. Things break. I use a DNS CNAME to point to the correct server. Let’s name our new cluster “dbc1” which uses db01 as primary and db02 as the standby/replica. We need to identify the cluster primary (p) and standby (s) servers.
dbc1p --> db01
dbc1s --> db02
At this point, assume something happens to db01 and it goes offline. We need to “promote” the replica to become the new primary with read-write ability. Log into db02, then as the postgres user, run
pg_ctl promote -D /var/db/postgres/data14
An easier alternative can be done from a postgres superuser (postgres) psql session
psql -h db02 -U postgres template1
testdb=# select pg_promote();
Now, we need to switch “dbc1p” to point to “db02” as well. Applications can continue/retry to run pointing to the dbc1p to perform writes. At this point you may need to build another replica or restore the primary server database.
There are more formal methods to manage replication and failover of your PostgreSQL clusters. Check out repmgr for one such solution.
High Availability is outside the scope of this article, as it is a large field and everything “depends” on your own needs. However there are a couple options for you to explore
PgPool - In load balancing mode, the connection pooler can send read operation (selects) to multiple standby/replicas and write operations to the primary server. This allows you to leverage your standby servers to serve read requests and offload work from the primary. It monitors “replication lag” and not send requests to standby servers if the lag to too long, preventing your users from seeing data that is too out of date.
Odyssey is a newer connection pool offering modern scalability features and configuration. I have not had an opportunity to explore it yet.
HAProxy - Set up a pair of PGProxy load balancers between your applications and your cluster nodes.
PostgreSQL has many different replication solutions and models. Replication is not a “one size fits all” solution. A common alternative to streaming replication is “Logical Replication”, which transmits data updates as SQL statements instead of the low-level WAL that streaming replication uses.
Logical Replication is ideal for migrating data between major version of PostgreSQL, which would require downtime to reload the data or execute a “pg_upgrade” process. Data pages change between major versions and can’t be mixed without doing this. When PG15 comes out, we can create a new DB cluster running PG15, then side-load the data to it using logical replication. Once caught up, you can switch over to the new cluster.
Another use for Logical Replication is Multi-Master (Primary?) mode. Multi-Master comes with it’s own set of conditions and methods for reliable usage, but could be the solution you may need.
pglogical is a PostgreSQL extension to provide logical replication.