Jesse Soyland | CrunchyData Blog

Postgres Migrations Using Logical Replication

Jesse.Soyland@crunchydata.com (Jesse Soyland) — Fri, 03 Oct 2025 11:00:00 EDT

Moving a Postgres database isn’t a small task. Typically for Postgres users this is one of the biggest projects you’ll undertake. If you’re migrating for a new Postgres major version or moving to an entirely new platform or host, you have a couple options:

Using pg_dump and pg_restore: pg_dump is a very reliable way to collect an entire database and restore it to a new place. This includes the entire schema, all tables, and special database elements. If you’re migrating a small database, that’s 50, 100, 150GB this is probably the easiest way to do it. On modern hardware a dump and restore using this method can be done in less than an hour.
Using WAL: For folks that have a WAL based backup system like pgBackRest or WAL-G/E, you can do a major Postgres migration by running a full base backup and streaming that WAL to your new host. Once you’re ready to do a cutover to the new database, you have an exact copy already standing by. This is a great way for larger databases, folks in the terabyte size, to do a major migration with minimal downtime.

But what if your database is too big for a dump restore and you can’t take the downtime? But you don’t have access to the WAL (i.e. you're on a host like RDS that doesn't share it). There’s a third option:

Logical migration: Using Postgres logical migration you can set up a database copy at a new location. While WAL has everything, logical replication just captures data and doesn't migrate schema, indexes, sequences, and a couple other fiddly bits. But with a few tricks in this post, you can capture everything for a full migration using logical replication.

The architecture of logical replication is straightforward, see our intro post on Data To Go: Postgres Logical Replication if you’re brand new to the topic. Your existing database will be the publisher, and the receiving database will be the subscriber. In the initial load, all data is copied from the publisher to the subscriber. Following the initial data copy, any transactions made on the publisher side are sent to the subscriber.

Step 1: Migrate schema

Logical replication only replicates data changes (INSERT, UPDATE, DELETE), so you must ensure that the target database has the correct schema beforehand. To get a schema-only dump of your source and apply to your database, run something like:

pg_dump -Fc -s $SOURCE_DB_URI | pg_restore --no-acl --no-owner -d $TARGET_DB_URI

If your migration process is proceeding while application development continues, you must make sure to update the receiving database's schema as you make any schema changes on your source database.

Step 2: Publisher (current host) set up

Logical replication is enabled via the wal_level setting. Some managed Postgres services may have a different want to turn this on. wal_level = logical Slot configuration Review the replication slots settings to make sure there are sufficient resources. For very large replication projects, the defaults may need to be changed.

max_replication_slots
max_wal_senders
max_logical_replication_workers
max_worker_processes
max_sync_workers_per_subscription

For details on how these parameters should be set, see the PostgreSQL chapter on logical replication configuration settings.

Networking

Make sure that the network/firewall for your old database permits connections from your new database.

Replication user for the new subscriber

You can create a specific user for this purpose that has the REPLICATION role attribute. Also make sure that the new role has read access to tables being replicated.

CREATE ROLE elizabeth WITH REPLICATION LOGIN PASSWORD 'my_password';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO elizabeth;

Find tables without primary keys or UNIQUE indexes

For logical replication, Postgres needs a way to uniquely identify rows to be updated/deleted. For tables with primary keys, that key is used, so first identify any tables that lack primary keys:

select tab.table_schema,
       tab.table_name
from information_schema.tables tab
left join information_schema.table_constraints tco
          on tab.table_schema = tco.table_schema
          and tab.table_name = tco.table_name
          and tco.constraint_type = 'PRIMARY KEY'
where tab.table_type = 'BASE TABLE'
      and tab.table_schema not in ('pg_catalog', 'information_schema')
      and tco.constraint_name is null
order by table_schema,
         table_name;

For tables without primary keys, any UNIQUE index can be used:

ALTER TABLE tablename REPLICA IDENTITY USING INDEX idx_some_unique_index;

If there are no existing UNIQUE indexes, one can be created, or the table can be set with REPLICA IDENTITY FULL - in which case it treats each row as its own "key":

ALTER TABLE tablename REPLICA IDENTITY FULL;

Next create a publication, which is a grouping of tables you intend to replicate. In most cases you will create a publication FOR ALL TABLES:

CREATE PUBLICATION bridge_migration FOR ALL TABLES;

Check that your tables are ready for publication, all the tables should be listed here.

SELECT * FROM pg_publication_tables;
``
## Step 3: Subscriber (new host) settings
On the new host side, create a subscription to each publication to begin receiving the published data. Using the connection details to your old host and the login details you created in Step 2, you set up a subscription to that replicated data.

```sql
CREATE SUBSCRIPTION bridge_migration CONNECTION 'host={host} port=5432 dbname={datatbase} user={login} password={password}' PUBLICATION bridge_migration;

Creating the subscription in this way will create a replication slot on the publisher and begin copying data from tables specified in the publication. A separate temporary slot will be created for each table for the duration of its initial data synchronization copy.

You can limit how many tables are synchronized at once with the max_sync_workers_per_subscription setting.

Step 4: Monitor the initial copy

You likely want to monitor this initial copy. The pg_stat_subscription table will show data on the subscriber end of the transaction:

select * from pg_stat_subscription;

-[ RECORD 1 ]---------+------------------------------
subid                 | 27183
subname               | bridge_migration
worker_type           | table synchronization
pid                   | 1197139
leader_pid            |
relid                 | 26721
received_lsn          |
last_msg_send_time    | 2025-09-26 15:54:45.095215+00
last_msg_receipt_time | 2025-09-26 15:54:45.095215+00
latest_end_lsn        |
latest_end_time       | 2025-09-26 15:54:45.095215+00
-[ RECORD 2 ]---------+------------------------------
subid                 | 27183
subname               | bridge_migration
worker_type           | apply
pid                   | 47075
leader_pid            |
relid                 |
received_lsn          | 4E32/7092F6F8
last_msg_send_time    | 2025-09-26 15:55:11.020012+00
last_msg_receipt_time | 2025-09-26 15:55:11.021989+00
latest_end_lsn        | 4E32/7092F3E0
latest_end_time       | 2025-09-26 15:55:10.843251+00

You can also look at the pg_subscription_rel view to see the synchronization state of each table with select * from pg_subscription_rel;.

Here, the state_code can tell you about each object:

d - data is being copied
f - finished table copy
s - synchronized
r - ready (normal replication)

Because of table bloat and other factors with internal table statistics, you won't be able to compare the table sizes. Though you can do select count(*) to compare row sizes.

Step 5: Testing and cutover

Now you can begin testing your application against the new database. Once you have confirmed that all the data is present, you can do a migration cutover. This will require stopping transactions on the original host, fixing your sequences, and pointing your application to the new database.

Step 6: Fix sequences

While logical replication will copy over all the data from the source, it doesn't update any of the sequences. For this reason, we recommend that you update your sequences post-cutover, before you begin production operations. The best approach to fixing your sequences is to simply create setval commands for all sequences in your source database(s), which you can do with this query:

SELECT
    'SELECT setval(' || quote_literal(quote_ident(n.nspname) || '.' || quote_ident(c.relname)) || ', ' || s.last_value || ');'
FROM
    pg_class c
    JOIN pg_namespace n ON n.oid = c.relnamespace
    JOIN pg_sequences s ON s.schemaname = n.nspname
        AND s.sequencename = c.relname
WHERE
    c.relkind = 'S';

The resulting output file can be executed on the new host to synchronize all sequences.

Final thoughts

Logical replication is a safe and effective migration strategy. Data consistency for replicated tables is ensured as long as the subscriber's schema is identical and replication is one-way with no conflicting writes on the subscriber.

We help folks with migrations to Crunchy Bridge every day. With Postgres you have a lot of choices for no-downtime or low downtime platform changes. Contact us to find out more about the right plan for your project.

Postgres Troubleshooting - DiskFull ERROR could not resize shared memory segment

Jesse.Soyland@crunchydata.com (Jesse Soyland) — Fri, 09 Aug 2024 08:00:00 EDT

There’s a couple super common Postgres errors you’re likely to encounter while using this database, especially with an application or ORM. One is the PG::DiskFull: ERROR: could not resize shared memory segment. It will look something like this.

"PG::DiskFull: ERROR: could not resize shared memory segment "/PostgreSQL.938232807" to 55334241 bytes: No space left on device"

Don’t panic

We see a good amount of support tickets from customers on this topic. If you see this error pass by in your logs. Don’t worry. Seriously. There’s no immediate reason to panic from a single one of these errors.

If you’re seeing them regularly or all the time, or your curious about how these are generated, let’s continue through some troubleshooting.

You aren’t really out of disk

In this case when it's stating "no space left on device" it's not talking about the entire disk, but rather the shared memory device at that exact moment. Segments are created there when a thread is allocating shared buffers for things like hashes, sorts, etc. Parallel workers will also allocate shared buffers. When there are not sufficient shared buffers remaining, the statement terminates with that sort of error.

The ‘disk full’ part of this error message is a bit of a red herring. This is an error that you'll see when your Postgres instance fails to allocate more memory in support of a query. It is not a real disk full message. Sometimes this happens when modest memory consuming queries that execute very slowly will end up tipping you past the available memory. Other times a huge memory-intensive query comes and takes a huge chunk of memory to cause this issue.

Why don’t these spill out to temp, like normally large queries? Well you probably just went over the total memory allocation. Work_mem is allocated for each query node that needs it, rather than once per query or session, meaning that a session can potentially consume many multiples of work_mem. For example, if max_parallel_workers is 8 and work_mem is 384MB, it's possible to use up to 3,072MB of shared buffers even with a single parallel hash join. If your query plan has 5 query nodes that would also allocate work_mem (ie. sorts / hash operations), and four parallel workers, you could be using (384MB x 5 query nodes x 4 workers) = 7.6GB of shared buffers. If you have 7.7 GB available, that’s not going to work.

To the logs we go

To see what’s going on with these errors, let’s get into the logs and see how often we’re seeing these. Search your logs for the resize memory issues.

$ grep -iR "could not resize shared memory" * | sed 's/.log.*//' | uniq -c
  1597 postgresql-Fri
   587 postgresql-Mon
   325 postgresql-Sat
  1223 postgresql-Sun
  1395 postgresql-Thu

You can also look for the specific process ID mentioned in the OOM error. For this one, its 5883275.

Aug 08 16:34:31 4qd4kp2ot5bwlmdnp7566v4owy postgres[5883275]: [36-1] [5883275][client backend][17/20137143][0] [user=application,db=postgres,app=/rails/bin/rails] ERROR:  could not resize shared memory segment "/PostgreSQL.2449246800" to 33554432 bytes: No space left on device

To track the error back to the origin, search your logs for that process id. You might see very long queries broken up into smaller sequence numbers like 42-1, 42-2, and 42-3 in this example

Aug 08 16:34:31 4qd4kp2ot5bwlmdnp7566v4owy postgres[5883275]: [42-1] [5883275][client backend][17/20137143][0] [user=application,db=postgres,app=/rails/bin/rails] ERROR:  could not resize shared memory segment "/PostgreSQL.2551246800" to 5883275 bytes: No space left on device

Aug 08 16:34:31 4qd4kp2ot5bwlmdnp7566v4owy postgres[5883275]: [42-2] [5883275][client backend][17/20137143][0] [user=application,db=postgres,app=/rails/bin/rails] STATEMENT: SELECT COUNT(*)
FROM trucks t
JOIN truck_locations tl ON t.truck_id = tl.truck_id
JOIN jobs j ON tl.location_id = j.location_id
JOIN job_hiring_locations_trucks_join jhltj ON j.job_id = jhltj.job_id AND t.truck_id = jhltj.truck_id
JOIN drivers d ON j.driver_id = d.driver_id
JOIN driver_certifications dc ON d.driver_id = dc.driver_id
JOIN certifications c ON dc.certification_id = c.certification_id

Aug 08 16:34:31 4qd4kp2ot5bwlmdnp7566v4owy postgres[5883275]: [42-3] "JOIN maintenance_records mr ON t.truck_id = mr.truck_id
JOIN maintenance_types mt ON mr.maintenance_type_id = mt.maintenance_type_id
JOIN job_status js ON j.status_id = js.status_id
JOIN locations l ON tl.location_id = l.location_id
JOIN job_types jt ON j.job_type_id = jt.job_type_id
JOIN job_priorities jp ON j.priority_id = jp.priority_id
JOIN fuel_records fr ON t.truck_id = fr.truck_id
JOIN fuel_stations fs ON fr.fuel_station_id = fs.fuel_station_id

Look for patterns in the logs: Start looking at individual examples in the errors and look for patterns with the event right before the OOM errors. Are you seeing the same query? Maybe large sorts, or large JOIN operations. Are you seeing secondary process jobs, ie Sidekiq, cron, etc? Large analytics-type queries? Those could be large or misconfigured.

Common Fixes for `Could Not Resize Shared Memory Segment`

Decrease reliance on hash tables and add indexes

In what I’ve seen in the wild, hash tables seem to be the main culprit for these types of errors, so that’s a good place to start. Hash joins are used for very large joins across tables and Postgres will create an in-memory hash table to store some of the data. Systems with large amounts of memory or larger work_mem settings can favor hash joins over other join methods like nested loops or merges if the data being joined is small enough to fit into work_mem but large enough that (or indexed so that) a nested loop is inefficient.

You can see which strategy the query planner is using by looking at the query’s EXPLAIN plan, ie:

EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*)
FROM trucks t

Finalize Aggregate  (cost=238.12..238.13 rows=1 width=8) (actual time=5.276..5.276 rows=1 loops=1)
  Buffers: shared hit=29
  ->  Gather  (cost=238.01..238.12 rows=2 width=8) (actual time=5.236..5.272 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=29
        ->  Partial Aggregate  (cost=238.01..238.02 rows=1 width=8) (actual time=5.226..5.227 rows=1 loops=3)
              Buffers: shared hit=29
              ->  HashAggregate  (cost=238.00..238.01 rows=1 width=4) (actual time=5.213..5.217 rows=3 loops=3)
                    Group Key: trucks.id
                    Buffers: shared hit=29
                    ->  Hash Join  (cost=37.75..236.75 rows=500 width=4) (actual time=0.605..4.879 rows=70 loops=3)
                          Hash Cond: (truck_locations.job_id = trucks.id)
                          Buffers: shared hit=29
                          ->  Seq Scan on truck_locations  (cost=0.00..18.20 rows=820 width=8) (actual time=0.010..0.054 rows=10 loops=3)
                                Buffers: shared hit=3
                          ->  Hash  (cost=27.25..27.25 rows=820 width=4) (actual time=0.575..0.576 rows=10 loops=3)
                                Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                Buffers: shared hit=26
Planning Time: 0.256 ms
Execution Time: 36.562 ms

Since the datasets being joined are fairly large, it may be possible to nudge the planner toward merge joins versus hash joins by adding indexes on the join keys of both of the tables. The join keys themselves are already indexed but since there are additional criteria in the queries for filters and other uses, including those columns in indexes can be beneficial.

A good rule of thumb is that if a query has a WHERE filter on column A and joins to another table via column B, a multicolumn index on (A, B) will help by reducing the amount of data being joined.

Decreasing work_mem

It is possible that your work_mem is set too generously and you’re allowing too much memory per worker.

Decreasing max_parallel_workers

You may want to peek at the settings you have for parallel workers. If you have a high work_mem setting, lots of parallel workers, and hash joins, you may be over allocating resources.

Dig into the queries

In a lot of cases, working through a specific query to make it more performant might be the place to go for fixing your OOM issues.

Adding WHERE clauses or LIMIT s to SELECT * queries can be a good starting place.
Creating views or materialized views to store table join data could help your database as well.

Add more memory

After you’ve added indexes and done what you can with individual queries, if you continue to see these errors, you might need to add more memory to your machine.

Quick summary

ERROR: could not resize shared memory segment is probably just a single query or operation that took up all your memory.
If you just have one of these, its no big deal. If you have a lot of them, its still no big deal. This is Postgres, everything’s fixable. There’s some simple things you can do to optimize and add indexes before you upgrade your instance to larger memory.
Look in your logs for the queries or processes giving the OOM errors.
Your #1 place to look is for queries joining large tables where the data being processed fits within work_mem. Adding indexes to strategically limit the amount of data being processed can help.

One PID to Lock Them All: Finding the Source of the Lock in Postgres

Jesse.Soyland@crunchydata.com (Jesse Soyland) — Thu, 18 Jan 2024 14:00:00 EST

On the Customer Success Engineering team at Crunchy Bridge, we run across customers with lock issues on their Postgres database from time to time. Locks can have a cascading effect on queries. If one process is locking a table, then a query can be waiting on the process before it, and the process before that one. Major lock issues can quickly take down an entire production Postgres instance or application.

In this post let’s look at why locks happen, and more importantly how to get to the bottom of a lock issue and the one process blocking everything else. That one process that blocks them all! Once you find the source of the lock, I’ll give you the options for terminating the process that created all your troubles in the first place.

Finding the source of the lock

Often you won’t immediately know that you have a lock issue. If something is off, queries aren’t returning, or your application is slow, finding statements blocked by locks is a great place to start.

1. Find processes that are waiting

Take a look at the pg_stat_activity view for processes that are active but have a wait_event or wait_event_type that are non-NULL:

SELECT
  pid,
  datname,
  usename,
  application_name,
  client_addr,
  client_port,
  to_char (now (), 'YYYY-MM-DD HH24:MI:SS') as now,
  to_char (now () - xact_start, 'DD HH24:MI:SS MS') as xact_time,
  to_char (now () - query_start, 'DD HH24:MI:SS MS') as query_time,
  state,
  to_char (now () - state_change, 'DD HH24:MI:SS MS') as state_time,
  wait_event,
  wait_event_type,
  left (query, 40)
FROM
  pg_stat_activity
WHERE
  state != 'idle'
  and pid != pg_backend_pid ()
ORDER BY
  query_time desc;

If a connection is active and waiting on a lock, then the wait_event and wait_event_type columns will be non-NULL. If that's the case (and it stays that way after a couple of runs of the query to ensure that you didn't just catch a short lock wait), record that affected PID. Here is a very simple example where I ran an update in a transaction, then in a different session added a column to the same table. The ALTER TABLE in this case will not proceed until the transaction from the prior thread has been committed or rolled back. Here are the results - note the PID 295998 that is "active" but has wait_event=relation and wait_event_type=Lock

  pid   | datname  | usename  | application_name |   client_addr   | client_port |         now         |    xact_time    |   query_time    |        state        |   state_time    | wait_event | wait_event_type |                   left
--------+----------+----------+------------------+-----------------+-------------+---------------------+-----------------+-----------------+---------------------+-----------------+------------+-----------------+------------------------------------------
 295995 | postgres | postgres | psql             | 149.42.105.253 |       49327 | 2023-11-09 20:41:10 | 00 00:02:11 535 | 00 00:02:01 755 | idle in transaction | 00 00:02:01 755 | ClientRead | Client          | RELEASE pg_psql_temporary_savepoint
 295998 | postgres | postgres | psql             | 149.42.105.253 |       49344 | 2023-11-09 20:41:10 | 00 00:01:55 550 | 00 00:01:01 138 | active              | 00 00:01:01 138 | relation   | Lock            | alter table sampledata add column data02
(2 rows)

2. Find which PID is locking the table

Now we know that the PID (295998) is awaiting a lock on a relation (table), but we don’t know what process currently holds the lock on which it is waiting. To find it, we start by querying pg_locks using the ID of the awaiting process:

SELECT
  *
FROM
  pg_locks
WHERE
  pid = 295998
  AND granted IS NOT true;

Here’s the result of that query:

locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction |  pid   |        mode         | granted | fastpath |          waitstart
----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+--------+---------------------+---------+----------+------------------------------
 relation |        5 |    16501 |      |       |            |               |         |       |          | 6/6743             | 295998 | AccessExclusiveLock | f       | f        | 2023-11-09 20:40:08.98843+00
(1 row)

The locktype column shows which of the other columns describe what Postgres is waiting on. In this example, locktype is relation, so we look to the relation column to see the OID of the relation (16501) where the blocking process has an active lock.

3. Find the process with the existing lock

Now that we know which object is locked, we can once again query pg_locks using the relation OID to see what is holding the current lock(s):

SELECT
  *
FROM
  pg_locks
WHERE
  relation = 16501
  AND granted IS true;

Here is the result:

locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction |  pid   |       mode       | granted | fastpath | waitstart
----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+--------+------------------+---------+----------+-----------
 relation |        5 |    16501 |      |       |            |               |         |       |          | 3/243227           | 295995 | RowExclusiveLock | t       | f        |
(1 row)

This shows that PID 295995 is the process holding the lock.

4. Find what that blocking process is doing

Now that we know which process has been granted the lock, we can go back to pg_stat_activity to see what that PID is doing:

SELECT
  pid,
  state,
  wait_event,
  wait_event_type,
  left (query, 40)
FROM
  pg_stat_activity
WHERE
  pid = 295995;

Here is the result:

pid   |        state        | wait_event | wait_event_type |                left
--------+---------------------+------------+-----------------+-------------------------------------
 295995 | idle in transaction | ClientRead | Client          | RELEASE pg_psql_temporary_savepoint

The last column is showing the last statement executed by that session, which in this case was the savepoint release after an update, but in most cases it will show an active transaction.

One lock to rule them all

The above statements are pretty straightforward once you know what you are looking for, but they can also be combined into a single statement for a general blocking / blocked query. The Postgres wiki has some good combined versions.

Often times you might find that the blocked statement is blocked by another (and another, and another still…). In those cases, it is still possible to trace all the way up to the One PID that blocks all the rest, but that can be an arduous, unexpected journey. For those cases, a colleague here at Crunchy Data, Brian Pace, wrote a query that helps to show locks waiting on other locks, rolling up to the PID holding the initial lock:

WITH sos AS (
	SELECT array_cat(array_agg(pid),
           array_agg((pg_blocking_pids(pid))[array_length(pg_blocking_pids(pid),1)])) pids
	FROM pg_locks
	WHERE NOT granted
)
SELECT a.pid, a.usename, a.datname, a.state,
	   a.wait_event_type || ': ' || a.wait_event AS wait_event,
       current_timestamp-a.state_change time_in_state,
       current_timestamp-a.xact_start time_in_xact,
       l.relation::regclass relname,
       l.locktype, l.mode, l.page, l.tuple,
       pg_blocking_pids(l.pid) blocking_pids,
       (pg_blocking_pids(l.pid))[array_length(pg_blocking_pids(l.pid),1)] last_session,
       coalesce((pg_blocking_pids(l.pid))[1]||'.'||coalesce(case when locktype='transactionid' then 1 else array_length(pg_blocking_pids(l.pid),1)+1 end,0),a.pid||'.0') lock_depth,
       a.query
FROM pg_stat_activity a
     JOIN sos s on (a.pid = any(s.pids))
     LEFT OUTER JOIN pg_locks l on (a.pid = l.pid and not l.granted)
ORDER BY lock_depth;

Example output from that statement:

pid   |   usename   | datname  |        state        |     wait_event      |  time_in_state  |  time_in_xact   |  relname   |   locktype    |        mode         | page | tuple |     blocking_pids      | last_session | lock_depth |                       query
--------+-------------+----------+---------------------+---------------------+-----------------+-----------------+------------+---------------+---------------------+------+-------+------------------------+--------------+------------+----------------------------------------------------
 879401 | application | postgres | idle in transaction | Client: ClientRead  | 00:29:53.512147 | 00:30:01.31748  |            |               |                     |      |       |                        |              | 879401.0   | select * from sampledata where id=101 for update;
 880275 | application | postgres | active              | Lock: transactionid | 00:01:00.342763 | 00:01:00.459375 |            | transactionid | ShareLock           |      |       | {879401}               |       879401 | 879401.1   | update sampledata set data = 'abc' where id = 101;
 880204 | application | postgres | active              | Lock: relation      | 00:00:29.722705 | 00:00:29.722707 | sampledata | relation      | AccessExclusiveLock |      |       | {879401,880275,879488} |       879488 | 879401.4   | alter table sampledata add column data03 integer;
 880187 | application | postgres | active              | Lock: relation      | 00:00:03.580716 | 00:00:03.580718 | sampledata | relation      | RowExclusiveLock    |      |       | {880204}               |       880204 | 880204.2   | update sampledata set data = 'abc' where id = 103;
 879527 | application | postgres | active              | Lock: relation      | 00:00:14.974433 | 00:28:32.80346  | sampledata | relation      | RowExclusiveLock    |      |       | {880204}               |       880204 | 880204.2   | update sampledata set data = 'abc' where id = 102;
 879488 | application | postgres | active              | Lock: tuple         | 00:00:41.35361  | 00:00:41.47118  | sampledata | tuple         | ExclusiveLock       |    2 |    21 | {880275}               |       880275 | 880275.2   | update sampledata set data = 'def' where id = 101;
(6 rows)

In this manufactured example we have:

879401 - the “idle in transaction” PID - This is a SELECT... FOR UPDATE within a transaction. Its blocking_pids field is blank because it’s not blocked by any other process. This is the process in this example that is blocking everything else.

880275 - Attempting to update the same id=101 - It’s blocked until the FOR UPDATE is completed.

879488 - Again attempting to update the same id=101 - It can’t execute until the process blocking it completes. It’s waiting on 880275 since it came in afterwards. If 880275 is canceled, it will just roll up to the next blocker, 879401.

880204 - Here added in an ALTER TABLE - since it takes an access exclusive lock, note it’s blocking_pids shows all three of the prior statements - it won’t be able to execute until each of those are out of the way

879527 - Blocked by the ALTER TABLE since it requires an AccessExclusiveLock. Note that it’s still blocked, even though it’s a different row (id=102).

880187 - Blocked also by ALTER TABLE. They are at the same lock_depth since they are both blocked by the same thing, but not by each other.

Ending the process holding the lock

Ok, now we’ve found the PID at the top of the tree, that one locking holding the key to the rest of our locks. Fortunately, as Postgres wizards, we do possess the craft to unmake the lock.

Commit

If the statement is showing as idle in transaction it is possible that you have a non-committed transaction open that started with a BEGIN statement. In that case you can commit with:

COMMIT;

Rollback

You may have performed some unintended updates, or run into an error. In that case you can abort the transaction and rollback any changes already made with:

ROLLBACK;

Cancel the PID

If this wasn't an transaction you initiated, in most cases you can cancel the running query with:

SELECT pg_cancel_backend(PID);

Terminate the backend connection and process

If the cancel statement above doesn’t work, you can cast the lock back into the fiery chasm from whence it came by executing a terminate back end statement. This will end the process and its associated database connection.

SELECT pg_terminate_backend(PID);

Why did Postgres lock?

Postgres’ multi-version concurrently control system is incredibly advanced and by and large is letting you query, update, and insert rows without locking tables. There are two main kinds of locks:

Shared locks - the resource can be accessed by more than one backend/session at the same time
Exclusive locks - the resource can only be accessed by a single backend/session at a time

The lock type that generally gets us into trouble and blocks other queries and processes are exclusive locks. If you want an overview, see David’s post, Postgres Locking: When Is It Concerning? There are probably hundreds of ways to put an exclusive lock on a table, but these are the most common ones we see with our customers.

Alter Table

By far the most common event I see to take an exclusive and detrimental lock is an ALTER TABLEcommand, which can be issued to the database directly or in some cases via the application’s ORM while running migrations. The ALTER TABLE itself takes an ACCESS EXCLUSIVE lock (see ALTER TABLE docs) which pretty much blocks every other process on that table.

ORM framework

ORM frameworks can hide circular dependencies that produce deadlocks. An error on the application side, where other operations run into errors while being executed within the transaction scope, can cause locks and result in future transactions taking a long time to complete.

Create index

Creating indexes can lock tables if you’re not using CREATE INDEX CONCURRENTLY.

Vacuum

VACUUM FULL will take out an ACCESS EXCLUSIVE lock against a table, so should be used only in rare cases.

Other

The Postgres documentation has a table showing the different lock modes, how they might block each other, and some examples of statement types that result in those locks.

Getting proactive about locks

Let’s look at a few tips for managing locking in the future.

Logging lock_waits

You can log any time your query is waiting on a lock by turning on log_lock_waits. Lock_waits in your logs can be a good indicator that processes are being contentious. There is virtually no overhead on enabling this and it’s very safe for production databases. This is set to “on” by default on Crunchy Bridge clusters:

log_lock_waits = on

Set a lock timeout

We generally recommend clients set a lock_timeout within a session so that it will cancel the transaction and relinquish any locks it was holding after a certain period of time. This helps to prevent other processes from getting caught up behind them in a chain.

ALTER SYSTEM SET lock_timeout = '10s';

Summary

Find processes waiting on locks in pg_stat_activity by looking for processes that are active but have a wait_event or wait_event_type that are non-NULL.
Use this query to find the source of the lock (seriously save this query somewhere, you might need it someday).

WITH sos AS (
	SELECT array_cat(array_agg(pid),
           array_agg((pg_blocking_pids(pid))[array_length(pg_blocking_pids(pid),1)])) pids
	FROM pg_locks
	WHERE NOT granted
)
SELECT a.pid, a.usename, a.datname, a.state,
	   a.wait_event_type || ': ' || a.wait_event AS wait_event,
       current_timestamp-a.state_change time_in_state,
       current_timestamp-a.xact_start time_in_xact,
       l.relation::regclass relname,
       l.locktype, l.mode, l.page, l.tuple,
       pg_blocking_pids(l.pid) blocking_pids,
       (pg_blocking_pids(l.pid))[array_length(pg_blocking_pids(l.pid),1)] last_session,
       coalesce((pg_blocking_pids(l.pid))[1]||'.'||coalesce(case when locktype='transactionid' then 1 else array_length(pg_blocking_pids(l.pid),1)+1 end,0),a.pid||'.0') lock_depth,
       a.query
FROM pg_stat_activity a
     JOIN sos s on (a.pid = any(s.pids))
     LEFT OUTER JOIN pg_locks l on (a.pid = l.pid and not l.granted)
ORDER BY lock_depth;

End the lock by canceling the pid or issuing a COMMIT or ROLLBACK of the process that’s holding the lock and blocking the other processes
Be careful with ALTER TABLE commands, CREATE INDEX without CONCURRENTLY part, or runaway processes from your ORM that may be holding exclusive locks and blocking general database processing.
It can be a good idea to set a lock_timeout and it is generally a good idea to log lock waits if you’re doing proactive logging to keep track of ongoing problems.

Thanks to my colleague Brian Pace for the great cascading locks query.

The Integer at the End of the Universe: Integer Overflow in Postgres

Jesse.Soyland@crunchydata.com (Jesse Soyland) — Fri, 03 Mar 2023 11:00:00 EST

Integer overflow occurs when a computer program tries to store an integer but the value being stored exceeds the maximum value that can be represented by the data type being used to store it. We have helped a few Crunchy Data clients navigate this recently and wanted to write up some notes.

In Postgres, there are three integer types:

smallint - A 2-byte integer, -32768 to 32767
integer- A 4-byte integer, -2147483648 to 2147483647
bigint - An 8-byte integer, -9223372036854775808 to +9223372036854775807

It is not uncommon to use a 4-byte integer as a primary key when defining a new table. This can cause problems if the value to be represented is more than 4-bytes can hold. If a sequence’s limit is reached you might see an error in your logs that looks like this:

ERROR:  nextval: reached maximum value of sequence "test_id_seq" (2147483647)

Don’t Panic! We have some helpful and intelligible PostgreSQL solutions.

How do you know if you are close to overflowing an integer?

The following query will identify any auto-incrementing columns, which SEQUENCE object it owns, data types of the column and SEQUENCE object, and percent until the sequence value exceeds the sequence or column data type:

SELECT
    seqs.relname AS sequence,
    format_type(s.seqtypid, NULL) sequence_datatype,
    CONCAT(tbls.relname, '.', attrs.attname) AS owned_by,
    format_type(attrs.atttypid, atttypmod) AS column_datatype,
    pg_sequence_last_value(seqs.oid::regclass) AS last_sequence_value,
    TO_CHAR((
        CASE WHEN format_type(s.seqtypid, NULL) = 'smallint' THEN
            (pg_sequence_last_value(seqs.relname::regclass) / 32767::float)
        WHEN format_type(s.seqtypid, NULL) = 'integer' THEN
            (pg_sequence_last_value(seqs.relname::regclass) / 2147483647::float)
        WHEN format_type(s.seqtypid, NULL) = 'bigint' THEN
            (pg_sequence_last_value(seqs.relname::regclass) / 9223372036854775807::float)
        END) * 100, 'fm9999999999999999999990D00%') AS sequence_percent,
    TO_CHAR((
        CASE WHEN format_type(attrs.atttypid, NULL) = 'smallint' THEN
            (pg_sequence_last_value(seqs.relname::regclass) / 32767::float)
        WHEN format_type(attrs.atttypid, NULL) = 'integer' THEN
            (pg_sequence_last_value(seqs.relname::regclass) / 2147483647::float)
        WHEN format_type(attrs.atttypid, NULL) = 'bigint' THEN
            (pg_sequence_last_value(seqs.relname::regclass) / 9223372036854775807::float)
        END) * 100, 'fm9999999999999999999990D00%') AS column_percent
FROM
    pg_depend d
    JOIN pg_class AS seqs ON seqs.relkind = 'S'
        AND seqs.oid = d.objid
    JOIN pg_class AS tbls ON tbls.relkind = 'r'
        AND tbls.oid = d.refobjid
    JOIN pg_attribute AS attrs ON attrs.attrelid = d.refobjid
        AND attrs.attnum = d.refobjsubid
    JOIN pg_sequence s ON s.seqrelid = seqs.oid
WHERE
    d.deptype = 'a'
    AND d.classid = 1259;

To show this query in action, let me set up a test table with an integer primary key, where the sequence has been artificially advanced to 2 Billion:

postgres=# create table test(id serial primary key, value integer);
CREATE TABLE
postgres=# select setval('test_id_seq', 2000000000);
   setval
------------
 2000000000
(1 row)

postgres=# \d test
                            Table "public.test"
 Column |  Type   | Collation | Nullable |             Default
--------+---------+-----------+----------+----------------------------------
 id     | integer |           | not null | nextval('test_id_seq'::regclass)
 value  | integer |           |          |
Indexes:
    "test_pkey" PRIMARY KEY, btree (id)

Now when running the query above to find the integer overflow percent, I can see that that the data types for both the column and the sequence are both integer, and since the sequence’s next value is 2 Billion, it is 93% through the acceptable range:

sequence   | sequence_datatype | owned_by | column_datatype | last_sequence_value | sequence_percent | column_percent
-------------+-------------------+----------+-----------------+---------------------+------------------+----------------
 test_id_seq | integer           | test.id  | integer         |          2000000001 | 93.13%           | 93.13%
(1 row)

Changing to negative number sequencing

Since the integer types in Postgres include negative numbers, a simple way to deal with integer overflow is to flip to sequencing with negative numbers. This can be done by giving the sequence a new start value of -1 and converting to a descending sequence by giving it a negative INCREMENT value:

alter sequence test_id_seq no minvalue start with -1 increment -1 restart;

If the purpose of the generated key is purely to create uniqueness, negative values are perfectly acceptable, but in some application frameworks or other use cases negative numbers may be undesirable or not work at all. In those cases we can change the field type entirely.

Keep in mind that the data type will need to be changed for any fields that reference this ID as well, or else they will also be out of bounds. Also any foreign key constraints will need to be dropped and reapplied after the both fields’ types have been updated.

Benefits of the negative number approach:

No change to the column structure
Very fast: just change the sequence start number

Drawbacks:

Negative numbers might not work with your application framework
You only buy yourself double the amount of IDs. You could be in this situation again soon

In general, this is a buy you some time approach and seen as a short term fix.

Changing to `bigint`

The more complete fix to your sequence exhaustion is changing to the bigint data type.

In order to change the field type of the above test table, we will first create a new ID of type bigint that will eventually replace the current id, and create a unique constraint on it:

alter table test add column id_new bigint;
CREATE UNIQUE INDEX CONCURRENTLY test_id_new ON test (id_new);

The new column will also need a new sequence of type bigint. The sequence needs to start at some point after the latest value that had been recorded.

CREATE SEQUENCE test_id_new_seq START 2147483776 AS bigint;
ALTER TABLE test ALTER COLUMN id_new SET DEFAULT nextval ('test_id_new_seq');
alter sequence test_id_new_seq owned by test.id_new;

Now new values can be added to the table, but there are two different sequences being incremented - the old and the new, ie:

postgres=# select * from test;
     id     | value |   id_new
------------+-------+------------
 2000000007 |       |
 2000000008 |       |
 2000000009 |       |
 2000000010 |       |
 2000000011 |       | 2147483776
 2000000012 |       | 2147483777
 2000000013 |       | 2147483778
 2000000014 |       | 2147483779

In a single transaction, we will drop the old ID constraint and default, rename columns, and add an invalid “not null” constraint on the new ID column:

BEGIN;
ALTER TABLE test DROP CONSTRAINT test_pkey;
ALTER TABLE test ALTER COLUMN id DROP DEFAULT;
ALTER TABLE test RENAME COLUMN id TO id_old;
ALTER TABLE test RENAME COLUMN id_new TO id;
ALTER TABLE test ALTER COLUMN id_old DROP NOT NULL;
ALTER TABLE test ADD CONSTRAINT id_not_null CHECK (id IS NOT NULL) NOT VALID;
COMMIT;

Now new IDs are being added to the table. Thanks to the NOT NULL constraint on id, new NULL values cannot be added, but since it is also NOT VALID the existing NULL values are permitted. In order to make id back into a primary key, the id_old data must be backfilled so that the constraint can be made valid. This can be done in batches, ie:

WITH unset_values AS (
    SELECT
        id_old
    FROM
        test
    WHERE
        id IS NULL
    LIMIT 1000)
UPDATE
    test
SET
    id = unset_values.id_old
FROM
    unset_values
WHERE
    unset_values.id_old = test.id_old;

Once all rows have been backfilled, the NOT NULL constraint can be validated, the UNIQUE index on id can be converted to a primary key, and finally the standalone NOT NULL constraint can be dropped:

ALTER TABLE test VALIDATE CONSTRAINT id_not_null;
ALTER TABLE test ADD CONSTRAINT test_pkey PRIMARY KEY USING INDEX test_id_new;
ALTER TABLE test DROP CONSTRAINT id_not_null;

At any point now the 4-byte id_old column can be dropped, as the bigint has taken its place:

postgres=# ALTER table test drop column id_old;
ALTER TABLE
postgres=# \d test
                              Table "public.test"
 Column |  Type   | Collation | Nullable |               Default
--------+---------+-----------+----------+--------------------------------------
 value  | integer |           |          |
 id     | bigint  |           | not null | nextval('test_id_new_seq'::regclass)
Indexes:
    "test_pkey" PRIMARY KEY, btree (id)

The new 8-byte bigint id should be sufficient for a very, very, very long time:

sequence     | sequence_datatype | owned_by | column_datatype | last_sequence_value | sequence_percent | column_percent
-----------------+-------------------+----------+-----------------+---------------------+------------------+----------------
 test_id_new_seq | bigint            | test.id  | bigint          |          2147483788 | 0.00%            | 0.00%

Benefits of the bigint:

This is a long term fix and you won't have to worry about running out of sequence numbers for a very long time.

Drawbacks:

You probably need to update a lot of other things to larger integers
Takes coordination with the entire database. In our experience, this is a large project.

`SERIAL` types

In Postgres, the SERIAL data types (smallserial, serial, and bigserial) are shortcuts for creating auto-incrementing identifier columns whose values are assigned the next value from a Postgres SEQUENCE object.

Creating a column of type SERIAL will default to as type integer, simultaneously creating an integer sequence object owned by the specified table column and make its nextval() the default value for the column.

For new tables, consider using BIGSERIAL.

Summary

You can check with a query if you’re running out of sequence numbers.
Changing to negative numbers can be a short term fix.
Changing to bigint is the recommended long term fix.
When you are setting up a new database that’s likely to have a lot of data in it using SERIAL look at BIGSERIAL instead.

Integer overflow may appear at a glance to be insanely complicated. I have written this to keep Postgres DBAs and intergalactic travelers from panicking.

Postgres Migration Pitstop: Collations

Jesse.Soyland@crunchydata.com (Jesse Soyland) — Fri, 02 Sep 2022 11:00:00 EDT

At Crunchy Data we spend a lot of time helping customers migrate their databases. Migrating Postgres tends to be a very straightforward process. Yet there can still be a few gotchas that can catch you off-guard if you are not prepared to deal with them. From some recent experiences with customers migrating to Crunchy Bridge we found most customers had not considered the underlying collations. These customers ran a risk of data corruption by not handling collation review and updates as part of their migration. A mismatched glibc is one of those details that could actually be a big gotcha and quite the headache if you are unaware of it - so we wanted to cover a few quick details.

Why should I care about mismatched `glibc`?

Using mismatched glibc versions can have a risk of:

Missing data when you query it
Inconsistent sorting between versions
Undetected unique constraint violations

These can all result in data corruption issues. For example, if you have a unique constraint on email addresses and the sort returns is different across versions - you may now have two accounts for users. You may get empty results when you query. Reconciling data corruption may be simple if it is a single record, but the longer it lives the bigger the cleanup and can result in weeks of pain.

We’ve seen differences between glibc versions when:

Using physical replication to migrate databases (i.e., wal-e, wal-g, pgbackrest) from one host to a new one.
Restoring a binary backup (i.e., pg_basebackup) on a system with different OS configuration.
Upgrading the Linux distribution to a new major release while keeping the PostgreSQL data directory. In this case, the glibc version may have changed but your underlying data did not.

Not all types of migrations or replication are affected by this inconsistency. Situations where the data is transported in a logical (not binary) way are quite safe, including:

Backups and restore processes using pg_dump, since these use logical data only
Logical replication, which uses only a data copy and not the physical copy

How the sorting works

For a very simple but practical example, on glibc versions older than 2.28 we can run this query and see how data sorts.

old-glibc::DATABASE=> SELECT * FROM (values ('a'), ('$a'), ('a$'), ('b'), ('$b'), ('b$'), ('A'), ('B'))                                                                                                      AS l(x) ORDER BY x ;
 x
----
 a
 $a
 a$
 A
 b
 $b
 b$
 B
(8 rows)

Then run the same on a newer version:

new-glibc::DATABASE=> SELECT * FROM (values ('a'), ('$a'), ('a$'), ('b'), ('$b'), ('b$'), ('A'), ('B'))                                                                                                        AS l(x) ORDER BY x ;
 x
----
 $a
 $b
 a
 A
 a$
 b
 B
 b$
(8 rows)

Thanks to @DanielVerite for a great example in his write-up on glibc and Postgres. Let’s dig in a bit more though.

What is glibc?

Libc is the main C library used by the Linux system. Many Linux programs, including Postgres, use the glibc implementation. It is used to provide many fundamental software operations and is used inside Postgres to do things like sorting text or comparing data when creating indexes.

A major update released with glibc 2.28 in 2018 brought localization and collation information into compliance with the 2016 Edition 4 ISO 14651 standards. With the update, indexes that were created with a prior version of the collations potentially exhibit corruption when being read by a system using the updated collations. If there is a mis-match the indexes must be rebuilt to avoid issues.

What collations are you using?

You can find the data collation your databases are using via the datcollate field of pg_database.

SELECT datname, datcollate FROM pg_database;
  datname  | datcollate
-----------+-------------
 postgres  | en_US.UTF-8
 demo      | en_US.UTF-8
 template1 | en_US.UTF-8
 template0 | en_US.UTF-8
(4 rows)

And to check your glibc version (this query is environment dependent)

select collname, collversion from pg_collation where collprovider = 'c';
     collname     | collversion
------------------+-------------
 C                |
 POSIX            |
 ucs_basic        |
 C.utf8           |
 en_AG            | 2.28
 en_AU            | 2.28
 en_AU.utf8       | 2.28

How do I fix it?

Fix during a migration

Since this issue shows up with binary data that is moved across operating system’s glibc versions, this generally shows up during a migration. Migrating via logical replication or logical backup (i.e., pg_dump) eliminates the issue as any affected indexes will be recreated at the time. So changing direction to logical restore might be worth thinking about.

For large databases, in excess of 100GB, logical backup migrations can take longer than desirable. In those cases, WAL migration followed by rebuilding affected indexes is generally the method we prefer in order to minimize downtime and engineering effort.

On a live database

If you think you might have an issue collations post migration, the amcheck extension help identify any data inconsistencies.

SELECT bt_index_check(index => c.oid, heapallindexed => true),
               c.relname,
               c.relpages
FROM pg_index i
JOIN pg_opclass op ON i.indclass[0] = op.oid
JOIN pg_am am ON op.opcmethod = am.oid
JOIN pg_class c ON i.indexrelid = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog'
-- Don't check temp tables, which may be from another session:
AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages;

bt_index_check |                    relname                    | relpages
----------------+-----------------------------------------------+----------
                | pg_publication_pubname_index                  |        1
                | pg_largeobject_loid_pn_index                  |        1
                | pg_largeobject_metadata_oid_index             |        1

If bt_index_check is empty, there’s no indexes that would change collations if a REINDEX is run. If the index check shows information, you’ll likely need to do REINDEX.

Side note: The amcheck extension can be a somewhat resource intensive process to run, both in terms of I/O and time. If you have a large or critical database, consider running this on a physical replica so you don’t disrupt production workflows, as this can be detected on a binary copy of the database.

Reindex

If you’ve found an issue with the above steps, you’ll need to REINDEX or REINDEX CONCURRENTLY. (Note: If you are using Postgres 14, we recommend to use 14.4 or higher to properly REINDEX CONCURRENTLY to avoid further risk of corruption).

Have questions?

Data migrations can often be straightforward but you want to make sure you ask the right questions instead of assuming things will just work. We hope you’ll have found this helpful as you consider a migration, but if you have additional questions please reach out as we may be able to help.

co-authored with Elizabeth Christensen, David Christensen, and Craig Kerstiens