<?xml version="1.0" encoding="UTF-8" ?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" version="2.0"><channel><title>Roberto Mello | CrunchyData Blog</title>
<atom:link href="https://www.crunchydata.com/blog/author/roberto-mello/rss.xml" rel="self" type="application/rss+xml" />
<link>https://www.crunchydata.com/blog/author/roberto-mello</link>
<image><url>https://www.crunchydata.com/build/_assets/roberto-mello.jpg-VICN73R2.webp</url>
<title>Roberto Mello | CrunchyData Blog</title>
<link>https://www.crunchydata.com/blog/author/roberto-mello</link>
<width>400</width>
<height>400</height></image>
<description>PostgreSQL experts from Crunchy Data share advice, performance tips, and guides on successfully running PostgreSQL and Kubernetes solutions</description>
<language>en-us</language>
<pubDate>Thu, 01 Jun 2023 09:00:00 EDT</pubDate>
<dc:date>2023-06-01T13:00:00.000Z</dc:date>
<dc:language>en-us</dc:language>
<sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<item><title><![CDATA[ Logical Replication on Standbys in Postgres 16 ]]></title>
<link>https://www.crunchydata.com/blog/logical-replication-on-standbys-in-postgres-16</link>
<description><![CDATA[ Postgres 16 is in beta and we are all excited about the updated logical decoding features. You heard that right, you can now do logical replication from a non-primary database! Roberto takes us through an example setup for this. ]]></description>
<content:encoded><![CDATA[ <p>Postgres 16 is hot off the press with the beta release last week. I am really excited about the new feature that allows logical replication from standbys, allowing users to:<ul><li>create logical decoding from a read-only standby<li>reduce the workload on the primary server<li>have new ways to achieve high-availability for applications that require data synchronization across multiple systems or for auditing purposes</ul><p>A second relevant and exciting new feature coming in 16 is that replication slots on a given standby persist the promotion of that standby to a primary. This means that in the event of primary server failure and promotion of a standby to primary, the replication slots will persist and the former-standby subscribers will not be affected.<p>These two together give PostgreSQL a huge boost in performance for big data operations. Applications moving data around in physical locations and folks doing data warehousing, analytics, data integration, and business intelligence. I’m going to walk through an example schema and database setup and offer some sample settings and code for creating logical replication from a standby.<h3 id=background-on-replication><a href=#background-on-replication>Background on replication</a></h3><p>At a high level PostgreSQL supports two main types of replication - streaming/physical and logical. The Write-Ahead Log can stream the entire set of physical files through a connection and represents the entire database on disk. Logical replication offers a more fine-grained approach, where you can specify individual database objects, such as tables or even specific rows, to be replicated to a remote server. You can read more about logical replication basics in <a href=https://www.crunchydata.com/blog/data-to-go-postgres-logical-replication>Data To Go: Postgres Logical Replication</a>.<p>A standby server in PostgreSQL is created by taking a base backup of the primary server and continuously applying all changes made on the primary. A hot standby is a standby server that can be promoted to become the primary server. PostgreSQL saves modifications in WAL (Write-Ahead Log) records and replicates them from the primary server to the standby servers. The configuration parameter <code>hot_standby_feeedback</code> prevents catalog rows from being deleted too early from the primary, if a standby needs them.<p>Development of logical decoding from standbys started in 2018. The process involved significant effort due to <a href=https://www.postgresql.org/message-id/flat/20181212204154.nsxf3gzqv3gesl32@alap3.anarazel.de>many intricate details</a>. To not miss the point here, this has been 5 years in the making and we’re ecstatic to see it land for Postgres 16.<p><img alt="logical rep from standby"loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/27395940-9aad-4b65-efb2-1e3bab417b00/public><h2 id=example-setup-for-replication-from-a-standby><a href=#example-setup-for-replication-from-a-standby>Example setup for replication from a standby</a></h2><p>Let’s say we have three different PostgreSQL servers, managing inventory for a global logistics company with warehouses distributed across the globe. In the Primary we have warehouse and inventory information, the standby is a physical copy high availability machine, and a third server that is getting specific changes for reporting purposes.<p><img alt="prod standby reporting"loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/bcd2383e-1b01-4199-f301-67c809e73400/public><h3 id=primary><a href=#primary>Primary</a></h3><p>In your primary instance, you need users with replication privileges. For this example I created a user for streaming the changes to the standby, and another to publish changes to the subscribers.<pre><code class=language-pgsql>CREATE ROLE repuser WITH REPLICATION LOGIN ENCRYPTED PASSWORD 'somestrongpassword';
CREATE ROLE pubuser WITH REPLICATION LOGIN ENCRYPTED PASSWORD 'differentstrongpassword';
</code></pre><p>Next I create a physical slot to copy changes from the primary to the standby. Strictly speaking this is not required, but in practice it really is. Without a physical replication slot, a restart of either node or a connection loss would disrupt the replication process:<pre><code class=language-pgsql>SELECT pg_create_physical_replication_slot('hot_standby_1');
</code></pre><p>We're interested only in the inventory for the Salt Lake City warehouse, whose code is <code>SLC</code>. On the primary we will create a publication <code>inventory_requirements</code> for the tables the <code>inventory</code> table depends on, and another publication named <code>inventory_slc_pub</code> :<p><img alt="sample tables"loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/88861537-eca8-4264-b53a-c12d5f0eaa00/public><pre><code class=language-pgsql>CREATE PUBLICATION inventory_requirements_pub
FOR TABLE regions, countries, warehouses, products;

CREATE PUBLICATION inventory_slc_pub
FOR TABLE inventory WHERE (warehouse = 'SLC1');

GRANT SELECT ON TABLE regions, countries, warehouses, products, inventory
TO pubuser;
</code></pre><h3 id=standby><a href=#standby>Standby</a></h3><p>At this point I can create my standby instance. We’re going to <a href=https://www.crunchydata.com/blog/pgbackrest-performing-backups-on-a-standby-cluster>initialize our standby with pgBackRest</a> (<code>pg_basebackup</code> can be used as well). Once your standby data directory has been restored, you'll need to edit its <code>postgresql.conf</code> and make sure it has some parameters (<a href=https://www.postgresql.org/docs/devel/runtime-config-replication.html>as documented here</a>):<pre><code class=language-ini># Provide feedback to the primary or upstream standby about
# queries currently executing on this standby
hot_standby_feedback = on

# Use the physical replication slot we created previously
primary_slot_name = 'hot_standby_1'

hot_standby = on
archive_mode = on

# If level is changed to below logical, slots will be dropped
wal_level = logical

# standby streams changes from the primary
primary_conn_info = 'host=127.0.0.1 port=5432 user=repuser password=somestrongpassword'

max_wal_senders = 10  # max number of walsender processes
max_replication_slots = 10  # max number of replication slots

# If an upstream standby server is promoted to become the new
# primary, downstream servers will continue to stream from
# the new primary
recovery_target_timeline = 'latest'
</code></pre><p>Connecting to this standby will confirm it's in read-only mode:<pre><code class=language-pgsql>SELECT pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
</code></pre><p>At this point we have:<ul><li>A primary instance replicating through a physical slot to a working standby with <code>hot_standby_feedback = on</code><li>Publications named <code>inventory_requirements_pub</code> and <code>inventory_slc_pub</code></ul><h3 id=logical-replica-from-standby><a href=#logical-replica-from-standby>Logical replica from standby</a></h3><p>Now we can go to our <code>reporting</code> PostgreSQL instance and subscribe to changes <strong>from the standby</strong>. Before Postgres 16 this operation would fail.<p>One major advantage of logical replication is that you can subscribe to changes from PostgreSQL servers of <strong>different versions</strong>! That gives you a lot of flexibility when working with applications using different versions of PostgreSQL.<pre><code class=language-pgsql>CREATE SUBSCRIPTION inventory_requirements_sub
CONNECTION 'dbname=inventory host=127.0.0.1 port=5434 user=pubuser password=differentstrongpassword'
PUBLICATION inventory_requirements_pub;

CREATE SUBSCRIPTION inventory_slc_sub
CONNECTION 'dbname=inventory host=127.0.0.1 port=5434 user=pubuser password=differentstrongpassword'
PUBLICATION inventory_slc_pub;
</code></pre><p>If your primary server is idle, this operation will hang. This is because the standby is waiting for information from the primary. You can speed up the creation of this information by calling the new function <code>pg_log_standby_snapshot()</code> on the primary. In this example, we'd call it twice because we're creating two subscriptions.<pre><code class=language-pgsql>SELECT pg_log_standby_snapshot();
 pg_log_standby_snapshot
-------------------------
 0/23000180
</code></pre><p>That allows the replica to continue, and produces a message like this, and tells us a replication slot was created on the standby. New and improved in Postgres 16!<pre><code class=language-pgsql>NOTICE:  created replication slot "inventory_requirements_sub" on publisher
CREATE SUBSCRIPTION
</code></pre><p>We can verify that in the <code>pg_stat_replication</code> system view on the standby.<pre><code class=language-pgsql>SELECT pid, application_name, state, sync_state FROM pg_stat_replication;

  pid  |      application_name      |   state   | sync_state
-------+----------------------------+-----------+------------
 23265 | inventory_slc_sub          | streaming | async
 23251 | inventory_requirements_sub | streaming | async
(2 rows)
</code></pre><p>As soon as the changes are replicated to the standby, they will then be copied downstream to the reporting server, and we'll see those changes there. Notice that only the <code>SLC1</code> records replicate over.<pre><code class=language-pgsql>SELECT * FROM inventory ORDER BY product_id;
 warehouse | product_id | quantity
-----------+------------+----------
 SLC1      |         11 |        7
 SLC1      |         13 |       13
 SLC1      |         15 |       18
 SLC1      |         22 |       15
 SLC1      |         24 |       20
</code></pre><p>From here I can create multiple logical replicas if I need, for different locations and warehouses.<h2 id=replication-slots-persisting-failover><a href=#replication-slots-persisting-failover>Replication slots persisting failover</a></h2><p>As we saw previously, when we created subscriptions that connected to the standby, it created replication slots on a non-writable standby. What happens if our primary has a catastrophic failure and our standby gets promoted? Will the different warehouses stop receiving changes because the standby we were getting changes from can no longer talk to the primary?<p><img alt="persisting failover"loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/3776e56e-fb7f-4a79-16a0-de4100a67f00/public><p>In Postgres 16 the replication slots persist after a failover, which is very exciting! Our former standby gets promoted to become the primary, the replication slots failover are persisted, and our subscribers continue to receive changes as if nothing had happened!<pre><code class=language-pgsql>SELECT pg_promote();
 pg_promote
------------
 t
(1 row)

SELECT pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 f
(1 row)

UPDATE inventory SET quantity = 2
WHERE warehouse = 'SLC1' and product_id = 11;
UPDATE 1
</code></pre><p>In our warehouse reporting PostgreSQL server, subscribed to our now-primary server, the changes to the <code>inventory</code> table appeared without any other action being taken (make sure that the tables you're logically replicating have a <a href=https://www.postgresql.org/docs/current/sql-altertable.html>REPLICA IDENTITY</a> set):<pre><code class=language-pgsql>
SELECT * FROM inventory WHERE warehouse = 'SLC1' order by product_id;

 warehouse | product_id | quantity
-----------+------------+----------
 SLC1      |         11 |        2
 SLC1      |         13 |       13
 SLC1      |         15 |       18
 SLC1      |         22 |       15
 SLC1      |         24 |       20
</code></pre><h2 id=conclusion><a href=#conclusion>Conclusion</a></h2><p>Did I mention I was excited?<p>As Postgres continues to be <strong>the</strong> database people turn to, it makes sense that supporting richer options in flowing data continue to arrive in Postgres. Logical replication from standbys continues to build on what first arrived several years ago and has gotten many enhancements since. Thanks to the amazing work of the PostgreSQL community, in PG 16 standby servers users will be able to:<ul><li>Create logical replication slots<li>Launch logical decoding<li>Subscribe to changes on a standby<li>Persist logical replication slots after a failover</ul><p>A standby server will be able to serve as source for logical replication subscribers with very few changes:<ul><li>Enable <code>hot_standby_feedback = on</code><li>Use a physical replication slot to replicate from upstream to the standby<li>Run <code>pg_log_standby_snapshot()</code> on the primary if a subscriber stalls waiting for changes while creating a subscription to a standby.</ul><p><a href=https://www.postgresql.org/docs/devel/logicaldecoding-explanation.html#LOGICALDECODING-REPLICATION-SLOTS>Documentation</a> for this new feature is still being written and improved upon, and I'll use what I learned to submit improvements. We’re excited for you to download, test and offer feedback too.<p>Image credit: AI-generated image by photographer (and my son &lt;3) <a href=https://rafaelmello.pro/>Rafael Mello</a>. ]]></content:encoded>
<category><![CDATA[ Production Postgres ]]></category>
<author><![CDATA[ Roberto.Mello@crunchydata.com (Roberto Mello) ]]></author>
<dc:creator><![CDATA[ Roberto Mello ]]></dc:creator>
<guid isPermalink="false">bc32651036a4b8abb285a0232a10c23338a1e7a09fa8980fedec6f999d950227</guid>
<pubDate>Thu, 01 Jun 2023 09:00:00 EDT</pubDate>
<dc:date>2023-06-01T13:00:00.000Z</dc:date>
<atom:updated>2023-06-01T13:00:00.000Z</atom:updated></item></channel></rss>