Logical Replication on Standbys in Postgres 16

Roberto.Mello@crunchydata.com (Roberto Mello) — Thu, 01 Jun 2023 09:00:00 EDT

Postgres 16 is hot off the press with the beta release last week. I am really excited about the new feature that allows logical replication from standbys, allowing users to:

create logical decoding from a read-only standby
reduce the workload on the primary server
have new ways to achieve high-availability for applications that require data synchronization across multiple systems or for auditing purposes

A second relevant and exciting new feature coming in 16 is that replication slots on a given standby persist the promotion of that standby to a primary. This means that in the event of primary server failure and promotion of a standby to primary, the replication slots will persist and the former-standby subscribers will not be affected.

These two together give PostgreSQL a huge boost in performance for big data operations. Applications moving data around in physical locations and folks doing data warehousing, analytics, data integration, and business intelligence. I’m going to walk through an example schema and database setup and offer some sample settings and code for creating logical replication from a standby.

Background on replication

At a high level PostgreSQL supports two main types of replication - streaming/physical and logical. The Write-Ahead Log can stream the entire set of physical files through a connection and represents the entire database on disk. Logical replication offers a more fine-grained approach, where you can specify individual database objects, such as tables or even specific rows, to be replicated to a remote server. You can read more about logical replication basics in Data To Go: Postgres Logical Replication.

A standby server in PostgreSQL is created by taking a base backup of the primary server and continuously applying all changes made on the primary. A hot standby is a standby server that can be promoted to become the primary server. PostgreSQL saves modifications in WAL (Write-Ahead Log) records and replicates them from the primary server to the standby servers. The configuration parameter hot_standby_feeedback prevents catalog rows from being deleted too early from the primary, if a standby needs them.

Development of logical decoding from standbys started in 2018. The process involved significant effort due to many intricate details. To not miss the point here, this has been 5 years in the making and we’re ecstatic to see it land for Postgres 16.

Example setup for replication from a standby

Let’s say we have three different PostgreSQL servers, managing inventory for a global logistics company with warehouses distributed across the globe. In the Primary we have warehouse and inventory information, the standby is a physical copy high availability machine, and a third server that is getting specific changes for reporting purposes.

Primary

In your primary instance, you need users with replication privileges. For this example I created a user for streaming the changes to the standby, and another to publish changes to the subscribers.

CREATE ROLE repuser WITH REPLICATION LOGIN ENCRYPTED PASSWORD 'somestrongpassword';
CREATE ROLE pubuser WITH REPLICATION LOGIN ENCRYPTED PASSWORD 'differentstrongpassword';

Next I create a physical slot to copy changes from the primary to the standby. Strictly speaking this is not required, but in practice it really is. Without a physical replication slot, a restart of either node or a connection loss would disrupt the replication process:

SELECT pg_create_physical_replication_slot('hot_standby_1');

We're interested only in the inventory for the Salt Lake City warehouse, whose code is SLC. On the primary we will create a publication inventory_requirements for the tables the inventory table depends on, and another publication named inventory_slc_pub :

CREATE PUBLICATION inventory_requirements_pub
FOR TABLE regions, countries, warehouses, products;

CREATE PUBLICATION inventory_slc_pub
FOR TABLE inventory WHERE (warehouse = 'SLC1');

GRANT SELECT ON TABLE regions, countries, warehouses, products, inventory
TO pubuser;

Standby

At this point I can create my standby instance. We’re going to initialize our standby with pgBackRest (pg_basebackup can be used as well). Once your standby data directory has been restored, you'll need to edit its postgresql.conf and make sure it has some parameters (as documented here):

# Provide feedback to the primary or upstream standby about
# queries currently executing on this standby
hot_standby_feedback = on

# Use the physical replication slot we created previously
primary_slot_name = 'hot_standby_1'

hot_standby = on
archive_mode = on

# If level is changed to below logical, slots will be dropped
wal_level = logical

# standby streams changes from the primary
primary_conn_info = 'host=127.0.0.1 port=5432 user=repuser password=somestrongpassword'

max_wal_senders = 10  # max number of walsender processes
max_replication_slots = 10  # max number of replication slots

# If an upstream standby server is promoted to become the new
# primary, downstream servers will continue to stream from
# the new primary
recovery_target_timeline = 'latest'

Connecting to this standby will confirm it's in read-only mode:

SELECT pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t

At this point we have:

A primary instance replicating through a physical slot to a working standby with hot_standby_feedback = on
Publications named inventory_requirements_pub and inventory_slc_pub

Logical replica from standby

Now we can go to our reporting PostgreSQL instance and subscribe to changes from the standby. Before Postgres 16 this operation would fail.

One major advantage of logical replication is that you can subscribe to changes from PostgreSQL servers of different versions! That gives you a lot of flexibility when working with applications using different versions of PostgreSQL.

CREATE SUBSCRIPTION inventory_requirements_sub
CONNECTION 'dbname=inventory host=127.0.0.1 port=5434 user=pubuser password=differentstrongpassword'
PUBLICATION inventory_requirements_pub;

CREATE SUBSCRIPTION inventory_slc_sub
CONNECTION 'dbname=inventory host=127.0.0.1 port=5434 user=pubuser password=differentstrongpassword'
PUBLICATION inventory_slc_pub;

If your primary server is idle, this operation will hang. This is because the standby is waiting for information from the primary. You can speed up the creation of this information by calling the new function pg_log_standby_snapshot() on the primary. In this example, we'd call it twice because we're creating two subscriptions.

SELECT pg_log_standby_snapshot();
 pg_log_standby_snapshot
-------------------------
 0/23000180

That allows the replica to continue, and produces a message like this, and tells us a replication slot was created on the standby. New and improved in Postgres 16!

NOTICE:  created replication slot "inventory_requirements_sub" on publisher
CREATE SUBSCRIPTION

We can verify that in the pg_stat_replication system view on the standby.

SELECT pid, application_name, state, sync_state FROM pg_stat_replication;

  pid  |      application_name      |   state   | sync_state
-------+----------------------------+-----------+------------
 23265 | inventory_slc_sub          | streaming | async
 23251 | inventory_requirements_sub | streaming | async
(2 rows)

As soon as the changes are replicated to the standby, they will then be copied downstream to the reporting server, and we'll see those changes there. Notice that only the SLC1 records replicate over.

SELECT * FROM inventory ORDER BY product_id;
 warehouse | product_id | quantity
-----------+------------+----------
 SLC1      |         11 |        7
 SLC1      |         13 |       13
 SLC1      |         15 |       18
 SLC1      |         22 |       15
 SLC1      |         24 |       20

From here I can create multiple logical replicas if I need, for different locations and warehouses.

Replication slots persisting failover

As we saw previously, when we created subscriptions that connected to the standby, it created replication slots on a non-writable standby. What happens if our primary has a catastrophic failure and our standby gets promoted? Will the different warehouses stop receiving changes because the standby we were getting changes from can no longer talk to the primary?

In Postgres 16 the replication slots persist after a failover, which is very exciting! Our former standby gets promoted to become the primary, the replication slots failover are persisted, and our subscribers continue to receive changes as if nothing had happened!

SELECT pg_promote();
 pg_promote
------------
 t
(1 row)

SELECT pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 f
(1 row)

UPDATE inventory SET quantity = 2
WHERE warehouse = 'SLC1' and product_id = 11;
UPDATE 1

In our warehouse reporting PostgreSQL server, subscribed to our now-primary server, the changes to the inventory table appeared without any other action being taken (make sure that the tables you're logically replicating have a REPLICA IDENTITY set):


SELECT * FROM inventory WHERE warehouse = 'SLC1' order by product_id;

 warehouse | product_id | quantity
-----------+------------+----------
 SLC1      |         11 |        2
 SLC1      |         13 |       13
 SLC1      |         15 |       18
 SLC1      |         22 |       15
 SLC1      |         24 |       20

Conclusion

Did I mention I was excited?

As Postgres continues to be the database people turn to, it makes sense that supporting richer options in flowing data continue to arrive in Postgres. Logical replication from standbys continues to build on what first arrived several years ago and has gotten many enhancements since. Thanks to the amazing work of the PostgreSQL community, in PG 16 standby servers users will be able to:

Create logical replication slots
Launch logical decoding
Subscribe to changes on a standby
Persist logical replication slots after a failover

A standby server will be able to serve as source for logical replication subscribers with very few changes:

Enable hot_standby_feedback = on
Use a physical replication slot to replicate from upstream to the standby
Run pg_log_standby_snapshot() on the primary if a subscriber stalls waiting for changes while creating a subscription to a standby.

Documentation for this new feature is still being written and improved upon, and I'll use what I learned to submit improvements. We’re excited for you to download, test and offer feedback too.

Image credit: AI-generated image by photographer (and my son <3) Rafael Mello.

Roberto Mello | CrunchyData Blog