Elizabeth Christensen | CrunchyData Blog

Postgres Scan Types in EXPLAIN Plans

Elizabeth.Christensen@crunchydata.com (Elizabeth Christensen) — Thu, 04 Dec 2025 08:00:00 EST

The secret to unlocking performance gains often lies not just in what you ask in a query, but in how Postgres finds the answer. The Postgres EXPLAIN system is great for understanding how data is being queried. One of secretes to reading EXPLAIN plans is understanding the type of scan done to retrieve the data. The scan type can be the difference between a lightning-fast response or a slow query.

Today I’ll break down the most common scan types, how they work, and when you’ll see them in your queries.

Sequential scan

This type of data scan reads the entire table, row by row checking to see what matches the query conditions. If you have a WHERE or FILTER, Postgres just scans each row looking for matches.

Sequence scans are kind of the foundation of how scans are done and for many searches, this is what Postgres will use. For very large data sets, or those queried often, sequential scans are not ideal and an index scan may be faster. For that reason - knowing how to spot a seq scan vs index scan when reading an EXPLAIN plan is one the most important parts of reading a scan type in a query plan.

EXPLAIN select * from accounts;

QUERY PLAN
-------------------------------------------------------------
Seq Scan on accounts  (cost=0.00..22.70 rows=1270 width=36)
(1 row)

Index Scan

When you create an index in Postgres, you’re creating a column or multi-column reference that is stored on disk. Postgres is able to use this index as a map to the data stored in the table. A basic index scan uses a B-tree to quickly find the exact location of the data using a a two-step process: first Postgres finds the entry in the index, uses the reference, and then it fetches the rest of the row data from the table.

EXPLAIN select * from accounts where id = '5';

                                  QUERY PLAN
-------------------------------------------------------------------------------
 Index Scan using accounts_pkey on accounts  (cost=0.15..2.37 rows=1 width=36)
   Index Cond: (id = 5)
(2 rows)

Note that primary keys are automatically indexed with a b-tree index, so queries that involve a primary key may use an index scan.

An index scan is typically faster than a sequential scan in Postgres when a query needs to retrieve only a very small fraction of rows from a large table. Using the index is faster than scanning the whole table.

However, index scans are not always faster. In many situations, Postgres’ query planner will correctly choose a sequential scan. This is typically for cases when the table being scanned is small or the percentage of rows returned outweighs using an index. If a query returns ~10%, a sequential scan is probably faster.

Bitmap Index Scan

If an index scan or a seq scan aren’t the perfect option, Postgres can use the the bitmap index scan as a kind of hybrid approach. It is typically chosen when a query matches too many rows for an regular index scan, but not so many that a sequential scan would be the best option.

This shows up in an EXPLAIN plan as a two-phased approach.

Bitmap Index Scan: First, Postgres scans one or more indexes to create an in-memory "bitmap", a simple map of all the table pages that might contain rows you need.
Bitmap Heap Scan: The bitmap is used to visit the main table. The key here is that it reads the required pages from the disk sequentially, which can be much faster than the random jumping of a standard index scan.

Bitmap index scans are common when a query has multiple filter conditions that each have a separate index. The bitmap scan allows the database to use separate indexes on different columns simultaneously. You’ll see this scan come up with WHERE conditions joined by AND or OR operators.

EXPLAIN SELECT customer_id, registration_date
FROM customer_records
WHERE gender = 'F'
  AND state_code = 'KS';
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on customer_records  (cost=835.78..8669.29 rows=49226 width=12) (actual time=5.717..38.642 rows=50184.00 loops=1)
   Recheck Cond: (state_code = 'NY'::bpchar)
   Filter: (gender = 'F'::bpchar)
   Rows Removed by Filter: 49682
   Heap Blocks: exact=6370
   Buffers: shared hit=6370 read=87
   ->  Bitmap Index Scan on idx_customer_state  (cost=0.00..823.48 rows=97567 width=0) (actual time=4.377..4.378 rows=99866.00 loops=1)
         Index Cond: (state_code = 'NY'::bpchar)
         Index Searches: 1
         Buffers: shared read=87
 Planning:
   Buffers: shared hit=27 read=2
 Planning Time: 0.774 ms
 Execution Time: 40.572 ms
(14 rows)

Parallel Sequential Scan

You will see a parallel sequential scan when Postgres uses multiple background workers to perform more than one sequential scan on a single large table at the same time. The table is broken into chunks, and each worker gets a chunk to scan, and the results are combined at the end in a gather process. Depending on your query - you may also have an aggregate or sort after the parallel queries and before the final gather. This is part of Postgres’ parallel query function.

EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT id, data_value
FROM parallel_test
WHERE data_value < 100000
ORDER BY data_value DESC
LIMIT 1000;

                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=161310.11..161431.04 rows=1000 width=16) (actual time=130.300..140.555 rows=1000.00 loops=1)
   Output: id, data_value
   Buffers: shared hit=142685
   ->  Gather Merge  (cost=161310.11..220311.14 rows=487915 width=16) (actual time=130.299..140.468 rows=1000.00 loops=1)
         Output: id, data_value
         Workers Planned: 5
         Workers Launched: 5
         Buffers: shared hit=142685
         ->  Sort  (cost=160310.04..160553.99 rows=97583 width=16) (actual time=112.942..112.973 rows=861.17 loops=6)
               Output: id, data_value
               Sort Key: parallel_test.data_value DESC
               Sort Method: top-N heapsort  Memory: 163kB
               Buffers: shared hit=142685
               Worker 0:  actual time=112.535..112.571 rows=1000.00 loops=1
                 Sort Method: top-N heapsort  Memory: 164kB
                 Buffers: shared hit=21729
               Worker 1:  actual time=112.271..112.308 rows=1000.00 loops=1
                 Sort Method: top-N heapsort  Memory: 164kB
                 Buffers: shared hit=21573
               Worker 2:  actual time=112.465..112.500 rows=1000.00 loops=1
                 Sort Method: top-N heapsort  Memory: 164kB
                 Buffers: shared hit=20549
               Worker 3:  actual time=99.099..99.133 rows=1000.00 loops=1
                 Sort Method: top-N heapsort  Memory: 163kB
                 Buffers: shared hit=17033
               Worker 4:  actual time=112.333..112.368 rows=1000.00 loops=1
                 Sort Method: top-N heapsort  Memory: 163kB
                 Buffers: shared hit=19964
               ->  Parallel Seq Scan on public.parallel_test  (cost=0.00..154959.67 rows=97583 width=16) (actual time=19.238..99.868 rows=83250.83 loops=6)
                     Output: id, data_value
                     Filter: (parallel_test.data_value < '100000'::numeric)
                     Rows Removed by Filter: 750082
                     Buffers: shared hit=142500
                     Worker 0:  actual time=18.837..99.169 rows=83026.00 loops=1
                       Buffers: shared hit=21692
                     Worker 1:  actual time=18.594..99.301 rows=84378.00 loops=1
                       Buffers: shared hit=21536
                     Worker 2:  actual time=18.706..99.551 rows=79196.00 loops=1
                       Buffers: shared hit=20512
                     Worker 3:  actual time=5.308..86.023 rows=81187.00 loops=1
                       Buffers: shared hit=16996
                     Worker 4:  actual time=18.694..99.497 rows=83574.00 loops=1
                       Buffers: shared hit=19927
 Planning:
   Buffers: shared hit=15
 Planning Time: 0.315 ms
 Execution Time: 140.635 ms
(47 rows)

Parallel index scan

A parallel index scan uses the same parallel workers to scan through an index concurrently. This uses the same methodology of the index scan - except that multiple workers are doing it simultaneously. Each process reads a different part of the index and returns results. Like the other parallel scans, this ends in a gather.

You will see a parallel index scan done when the indexes and tables involved are very large - and the overall operation to split things up and gather them at the end is faster than handing the job to a single worker.

EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT data_id, filler_text
FROM parallel_index_test
WHERE data_id BETWEEN 1000000 AND 2000000;

                                                                                QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.43..34560.34 rows=995971 width=109) (actual time=1.014..145.796 rows=1000001.00 loops=1)
   Output: data_id, filler_text
   Workers Planned: 4
   Workers Launched: 4
   Buffers: shared hit=23385
   ->  Parallel Index Scan using idx_data_id on public.parallel_index_test  (cost=0.43..33564.37 rows=248993 width=109) (actual time=0.941..38.211 rows=200000.20 loops=5)
         Output: data_id, filler_text
         Index Cond: ((parallel_index_test.data_id >= 1000000) AND (parallel_index_test.data_id <= 2000000))
         Index Searches: 1
         Buffers: shared hit=23385
         Worker 0:  actual time=2.104..45.540 rows=240638.00 loops=1
           Buffers: shared hit=5640
         Worker 1:  actual time=2.174..45.169 rows=240096.00 loops=1
           Buffers: shared hit=5638
         Worker 2:  actual time=0.067..45.380 rows=242658.00 loops=1
           Buffers: shared hit=5693
         Worker 3:  actual time=0.306..45.122 rows=242292.00 loops=1
           Buffers: shared hit=5686
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.526 ms
 Execution Time: 180.660 ms
(22 rows)

Index-Only Scan

An Index-Only Scan is the superstar of scans and answers the entire query using only the information stored within the index itself. Index only scans are also called “covering indexes” meaning the index itself covers all the data. It never even has to touch the main table. Index only scans are a huge performance win because they’re very fast - no information needs to be retrieved from the heap table. They also typically use less i/o resources because indexes are very cache friendly and often in shared buffers - meaning no data needs to be read for the underlying disk.

Queries benefit from a covering index in these situations:

The query is very frequently executed.
The current query is performing a standard index scan followed by many slow disk reads (heap fetches) and using i/o.
The query only requires a small subset of the table's columns, for example you select only three columns from a table of twenty.
The columns have a low write frequency. Any column that is indexed must be written to disk and the index, so if you start adding covering indexes for all your columns - you’re essentially creating write amplification.
The new index, which must cover all needed columns, won't be excessively large. Indexes are stored on disk so you don’t want to cause storage issues.

EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT code, status
FROM index_only_test
WHERE code > 'CODE_050000'
ORDER BY code
LIMIT 100;
                                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..2.60 rows=100 width=13) (actual time=0.346..0.362 rows=100.00 loops=1)
   Output: code, status
   Buffers: shared hit=1 read=3
   ->  Index Only Scan using idx_code_status on public.index_only_test  (cost=0.42..1068.02 rows=49000 width=13) (actual time=0.345..0.352 rows=100.00 loops=1)
         Output: code, status
         Index Cond: (index_only_test.code > 'CODE_050000'::text)
         Heap Fetches: 0
         Index Searches: 1
         Buffers: shared hit=1 read=3
 Planning:
   Buffers: shared hit=19
 Planning Time: 1.838 ms
 Execution Time: 0.385 ms
(13 rows)

Summary

We’ve covered all the major scan types so now reading your EXPLAIN plans will be a little easier.

Seq scan - Postgres looks through the whole table in sequential order to find the query data
Index scan - Postgres first looks at the index and then fetches the row data the index pointed to
Bitmap index scan - Postgres first read the index and created a bitmap list matching rows. Second, Postgres read the data heap using the bitmap in a more efficient method than a sequential scan.
Parallel scan - Postgres used multiple parallel workers to scan the table and data was gathered at the end
Parallel index scan - Postgres used multiple workers to do an index scan and data was gathered at the end
Index only scan- All data for the query was in the index

And here’s everything all in one graphic:

Postgres Internals Hiding in Plain Sight

Elizabeth.Christensen@crunchydata.com (Elizabeth Christensen) — Fri, 07 Nov 2025 08:00:00 EST

Postgres has an awesome amount of data collected in its own internal tables. Postgres hackers know all about this - but software developers and folks working with day to day Postgres tasks often miss out the good stuff.

The Postgres catalog is how Postgres keeps track of itself. Of course, Postgres would do this in a relational database with its own schema. Throughout the years several nice features have been added to the internal tables like psql tools and views that make navigating Postgres’ internal tables even easier.

Today I want to walk through some of the most important Postgres internal data catalog details. What they are, what is in them, and how they might help you understand more about what is happening inside your database.

psql’s catalog information

The easiest way to get at some of Postgres’ internal catalogs is to use the built-in psql commands that begin \d generally. Here’s some common Postgres ones users should be comfortable using:

\d {tablename}: describes a specific table. \d will do a lot of things if you qualify \d with a table or view name.

\di: list all your indexes

\dx: list installed extensions

\dp: to show access privileges

\dp+: tables and views with the roles and access details

\dconfig: your current configuration settings

\dt {tablename}: describe a table

\dti+: tables and indexes with sizes

\dg+: show role names

\df: show your functions

\dv {view name}: describe a view

\l: lists all your databases

Important Postgres catalog views

Postgres exposes many of the complex internals of the database system in easy-to-query views. These host a wealth of information about what is going on inside your database and direct SQL access to answer in the moment emergency questions like “what is taking up all my CPU” and more long term questions like “what are my 10 slowest queries”.

pg_stat_activity

Shows current database activity, including running queries, state, and client information. Essential for troubleshooting and getting process ids (pid) for bad actors.

SELECT pid, usename, datname, client_addr, application_name, state, query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY state, query_start DESC;

pg_stat_statements

This requires the pg_stat_statements extension - but it is part of the contrib library and ships with Postgres, so doesn't require separate installation.

This view tracks execution statistics for all queries executed by all databases. It's incredibly powerful for identifying slow or frequently executed queries.

-- pg_stat_statements 10 longest running queries
SELECT query, calls, total_exec_time, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

pg_stat_database

This view provides database-wide statistics, such as the number of connections, transactions, and I/O. It's useful for a high-level overview of database activity and health.

-- high leve db stats for the postgres db
SELECT datname,numbackends, xact_commit, xact_rollback, blks_read, blks_hit
FROM pg_stat_database
WHERE datname = 'postgres';

pg_locks

This view displays information about locks held by active processes. This is the go to place for troubleshooting locking issues, deadlocks, and contention within the database. We have a great blog on locking and how to find the source of the lock in Postgres.

-- locks joined with the activity table. Shows not granted locks, typically those that could not be granted because they were blocked by other locks
SELECT a.datname, l.pid, l.locktype, l.relation::regclass, l.mode, l.granted
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE NOT l.granted;

pg_stat_user_tables

This view offers statistics on tables, including sequential scans, index scans, and row-level operations (inserts, updates, deletes). It's great for identifying tables with heavy activity or those that need vacuuming.

-- see sequence scans and index scans by table
SELECT relname AS table_name, seq_scan, idx_scan
FROM pg_stat_user_tables
WHERE seq_scan > 0 OR idx_scan > 0 ORDER BY seq_scan DESC;

pg_stat_user_indexes

This view provides statistics on user indexes, such as how often they're used and how many tuples are read. This is particularly herpful for finding unused or underutilized indexes.

-- Never used indexes in Postgres sorted by size
SELECT s.schemaname, s.relname AS table_name, s.indexrelname AS index_name, pg_size_pretty(pg_relation_size(s.indexrelid)) AS index_size, s.idx_scan
FROM pg_stat_user_indexes AS s
JOIN pg_index AS i ON s.indexrelid = i.indexrelid
WHERE s.idx_scan = 0 AND i.indisunique IS FALSE
ORDER BY pg_relation_size(s.indexrelid) DESC;

pg_settings

This is a prebuilt view that is super useful for viewing configuration parameters, their current values, and their descriptions. Qualify with ILIKE to see exact parameters you’re looking for.

-- find shared_buffer or work_mem settings
SELECT name, setting, unit, short_desc
FROM pg_settings
WHERE name LIKE '%shared_buffers%' OR name LIKE '%work_mem%';

pg_roles

This view describes all system roles, which include users and groups. It's useful for checking permissions, login capabilities, and role memberships.

-- This query lists all roles, showing their names, whether they can log in, and their password expiration date.
SELECT rolname, rolcanlogin, rolvaliduntil
FROM pg_roles
ORDER BY rolname;

pg_database

This view contains all databases in the cluster. It provides key metadata for each database, including its owner, character encoding, and access privileges. We have a lot of folks now that create dozens and sometimes hundreds of databases for development, so this is a good high level view.

-- This query lists all Postgres databases, their sizes, and owners.
SELECT d.datname AS database_name, pg_size_pretty(pg_database_size(d.datname)) AS database_size, pg_get_userbyid(d.datdba) AS owner
FROM pg_database AS d
WHERE d.datistemplate = false;

Postgres catalog tables

Behind the Postgres metacommands and views - there are several core catalog tables. Many of the psql commands match up with the catalog tables. Something roughly like this:

psql command	what data	catalog tables
\d	tables and table objects	pg_class
\di	indexes	pg_class, pg_index
\dx	installed extensions	pg_extension
\dp	tables and privileges	pg_class, pg_roles, pg_attribute
\l	databases	pg_database
\df	available functions	pg_proc

Let’s look at these and how you might want to use them.

pg_stats

The pg_stats table collects all the details about your columns - things like cardinality - are there many items in this column or a few? Postgres uses a lot of the details in pg_stats to make decisions for the query planner and efficiently. In some cases, giving pg_stats more information can make your queries faster.

-- table column data like cardinality
SELECT * FROM pg_stats
WHERE tablename = 'table_name'
AND attname = 'column_name';

pg_class

pg_class contains a row for every table, index, sequence, view, materialized view, and other "relation-like" objects in the database. Sometimes this is a nice high level view of an entire table’s accoutrements.

SELECT c.relname, pg_get_userbyid(c.relowner) AS owner
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'public' AND c.relkind = 'r'
ORDER BY c.relname;

pg_type

This table stores all data types that exist. It's confusing though - in Postgres, every table has an associated composite type that defines the structure of its rows. So if you do a select * you’ll see all the table names here and all the data types. If you filter a bit, you can see all your custom data types, domains, and enums.

-- see your custom data types in Postgres
SELECT
    t.typname AS type_name,
    n.nspname AS schema_name,
    t.typtype AS type_class
FROM
    pg_type AS t
JOIN
    pg_namespace AS n ON t.typnamespace = n.oid
LEFT JOIN pg_class c ON typrelid = c.oid
WHERE
    t.typtype IN ('e', 'd', 'c') -- 'e' for enum, 'd' for domain, 'c' for composite types.
    AND n.nspname NOT IN ('pg_catalog', 'information_schema', 'pg_toast')
    AND (t.typtype <> 'c' OR c.relkind = 'c')
ORDER BY
    schema_name, type_name;

pg_proc

This is the catalog of all functions and stored procedures that Postgres can use. It contains metadata about each routine. Made a function last week but can’t find it now? Just scan through all of them.

-- This query finds all functions, triggers, and stored procedures.
SELECT proname AS function_name, proargnames AS argument_names, pg_catalog.format_type(prorettype, NULL) AS return_type
FROM pg_proc
ORDER BY proname;

pg_attribute

This table stores information about table columns and there is one row in pg_attribute for every column in every table. While indexes and other objects that have an entry in pg_class.

Query columns and data types for any table with a query like this:

SELECT
    a.attname AS column_name,
    pg_catalog.format_type(a.atttypid, a.atttypmod) AS data_type
FROM
    pg_catalog.pg_attribute a
WHERE
    a.attrelid = 'orders'::regclass
    AND a.attnum > 0
    AND NOT a.attisdropped
ORDER BY
    a.attnum;

pg_catalog schema

The pg_catalog is the schema holding the system tables, so you will either need to include pg_catalog in your search_path (the default), or any query you issue will need to be qualified with pg_catalog.

Here’s a summary of the internal catalog tables:

pg_catalog	schema holding all the catalog tables
pg_stats	table and column statistics, like cardinality
pg_attribute	row for every table column
pg_class	every table, index, view, materialized view, forgien table
pg_type	data types, built in and custom

Exploring system tables with `ECHO_HIDDEN` or `-E`

Sometimes navigating these tables and views can be confusing and require browsing through a mix of docs and source code. If you want to have some fun exploring how the catalog is connected, you can connect to your database with -E argument to psql (or do \set ECHO_HIDDEN on if you’re already connected). Postgres will echo each psql the command that's run with SQL so you can grab the underlying SQL and edit from there.

For example, echoing \dt+ will show me a query and the results.

SELECT n.nspname as "Schema",
  c.relname as "Name",
  CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 't' THEN 'TOAST table' WHEN 'f' THEN 'foreign table' WHEN 'p' THEN 'partitioned table' WHEN 'I' THEN 'partitioned index' END as "Type",
  pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
  CASE c.relpersistence WHEN 'p' THEN 'permanent' WHEN 't' THEN 'temporary' WHEN 'u' THEN 'unlogged' END as "Persistence",
  am.amname as "Access method",
  pg_catalog.pg_size_pretty(pg_catalog.pg_table_size(c.oid)) as "Size",
  pg_catalog.obj_description(c.oid, 'pg_class') as "Description"
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
     LEFT JOIN pg_catalog.pg_am am ON am.oid = c.relam
WHERE c.relkind IN ('r','p','')
      AND n.nspname <> 'pg_catalog'
      AND n.nspname !~ '^pg_toast'
      AND n.nspname <> 'information_schema'
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;

List of tables
-[ RECORD 1 ]-+--------------
Schema        | public
Name          | articles
Type          | table
Owner         | dba
Persistence   | permanent
Access method | heap
Size          | 16 kB
Description

Getting to Postgres internals

The easiest way to see internals is to start with the psql \d commands
The prebuilt views like pg_stat_activity, pg_stat_statements, pg_locks, and pg_stat_user_indexes are ready to go for easy querying and searching.
Going a step deeper, you can access the underlying internal Postgres tables, housed in the pg_catalog schema. -E echo_hidden can help you see the tables involved if you echo psql commands.

Postgres’ Original Project Goals: The Creators Totally Nailed It

Elizabeth.Christensen@crunchydata.com (Elizabeth Christensen) — Tue, 23 Sep 2025 09:00:00 EDT

I had a chance last week to sit down and read the original academic paper announcing Postgres as a platform and the original design goals from 1986. I was just awestruck at the forethought - and how the original project goals laid the foundation for the database that seems to be taking over the world right now.

The PostgreSQL creators totally nailed it. They laid out a flexible framework for a variety of business use cases that would eventually become the most popular database 30 years later.

The paper outlines 6 project goals:

better support for complex objects growing world of business and engineering use cases
provide user extendibility for data types, operators and access methods
provide facilities for active databases like alerters and triggers
simplify process for crash recovery
take advantage of upgraded hardware
utilize Codd’s relational model

Let's look at all of them in reference to modern features of Postgres.

1) Objects and data types for a growing world of business and engineering use cases

Postgres has a rich and flexible set of native data types that are designed to meet a vast array of business use cases, from simple record-keeping to complex data analysis.

Numeric Types like SMALLINT and INTEGER are used for whole numbers while BIGINT might be for a user's unique ID or primary keys. Precision like NUMERIC and DECIMAL are used, exact precision is critical, especially for money in Postgres. Floating-Point Types like REAL or DOUBLE PRECISION can be used for scientific or engineering calculations where absolute precision isn't as important as the range of values. You also have your UUID (indexable UUIDs in Postgres 18) for distributed systems and secure URLs.

Character Types like VARCHAR(n) or CHAR(n) store variable-length text up to a specified maximum length (n) and only use as much storage as needed for the actual text.

Date/Time Types like DATE stores only the date (year, month, day). TIMESTAMPTZ is the time and date GOAT with and is easily implemented into global systems.

But, wait, that’s not all, Postgres has within it, the ability to easily make custom data types and constrain data to the specifics of each use case.

Using CREATE DOMAIN you can create specific value check like confirming a range for birthday or email format validity.

-- Postgres create domain
CREATE DOMAIN date_of_birth AS date
CHECK (value > '1930-01-01'::date);

CREATE DOMAIN valid_email AS text
NOT NULL
CHECK (value ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$');

Or using a direct CREATE TYPE you can make a new type as a composite. For example, new custom date type allowing for storage of height, width, and, weight in a single field.

-- Postgres create type with composite
CREATE TYPE physical_package AS (
height numeric,
width numeric,
weight numeric);

Enums let you create a custom type with a set of predefined values.

-- Postgres enum
CREATE TYPE order_status AS ENUM (
'pending',
'shipped',
'cancelled');

Constraints take the enumerated type a bit further and let you specify rules and restrictions for data. Additionally adding a CHECK constraint to a list or even refer to other fields, like reserving a room with a start and end time.

-- Postgres check contraint
ALTER TABLE public.reservations
ADD CONSTRAINT start_before_end
CHECK (start_time < end_time);

While most applications will constrain data in its own way, Postgres’ strict and flexible typing allows both rigid validity and flexibility.

2) Extensibility for data types, operators and access methods

The authors knew that just data types wouldn’t be enough - the system would actually need to be extensible. In my estimation - this is actually the killer feature of Postgres. Sure, the database is solid - but the ingenuity and enthusiasm of the extension ecosystem is incredibly special.

Let’s take PostGIS for example. This extension adds several key data types to the mix - the point, line, polygon, to store geospatial types. PostGIS also has hundreds of functions with it. There’s now an entire ecosystem of its own around this project that includes open-source mapping and fully open source web servers that rival paid GIS systems like ESRI.

The pgvector extension is another good example of Postgres extensibility too. Now Postgres can store embedding data right alongside application data. You can have LLMs create embeddings based on your data and you can query your data to find relatedness. You can also build your own Postgres RAG system right inside your database

-- find distance between two embedding values
recipe_1.embedding <=> recipe_2.embedding

Data types and extensions aren’t the only thing that came out of this idea though - the indexes themselves in Postgres are incredibly advanced. Generalized Inverted Index (GIN) and Generalized Search Tree (GiST) are themselves extensible indexing frameworks that support many of the complex data types mentioned above.

3) Features for active databases like alerters and triggers

Modern Postgres users have a suite of tools available to them to have the database do necessary work. The trigger system easily updates fields once another field changes.

-- Postgres sample function to update fields
CREATE OR REPLACE FUNCTION update_inventory_on_sale()
RETURNS TRIGGER AS $$
BEGIN
UPDATE products
SET quantity_on_hand = quantity_on_hand - NEW.quantity_sold
WHERE id = NEW.product_id;
IF NOT FOUND THEN
RAISE EXCEPTION 'No product found with ID %', NEW.product_id;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

For events outside the database, Postgres has a handy little NOTIFY/LISTEN mechanism for sending notifications to the outside so your application or dashboard will know when a new order was placed or a specific action happened. There’s an extension now to use the listen notify system events as WebSockets.

Postgres’ logical replication makes use of the ‘active database’ idea. PostgreSQL's logical replication is cool because it streams individual data changes rather than physical block-level copies, allowing you to replicate data between different major Postgres versions or even different platforms. This flexibility enables powerful use cases like creating specialized read replicas, consolidating multiple databases into a central one, and performing zero-downtime major version upgrades.

-- Postgres create logical replication
CREATE PUBLICATION user_pub FOR TABLE user_id, forum_posts;

4) Simplify process for crash recovery

The original method of Postgres data recovery relied on writing all data modifications to the files on disk before each commit which was called "force-to-disk". Unfortunately this original implementation had major performance issues and a potential for corruption. The Write Ahead Log (WAL) which was released with version 7.1 changed this into a different system that first writes changes to a log file and then applies those changes to the main data files.

WAL is the foundation of all of Postgres’ amazing backup and disaster recovery story. WAL is used to create incremental backups, complete with the Point-in-Time disaster recovery system that many rely on today.

WAL is also foundational to Postgres streaming replication, which makes high availability possible. A primary writes all database changes (inserts, updates, deletes) into its Write-Ahead Log and then "streams" these WAL records over the network to the standby (replica) nodes. The standby nodes receive these WAL records and apply them to their own copy of the database, keeping them in sync with the primary. In the event of an emergency automated failover, like Patroni, can promote a new primary.

5) Take advantage of upgraded hardware

PostgreSQL was engineered for the hardware realities of its time: single-core CPUs, severely limited RAM often measured in megabytes, and slow, spinning hard drives. The primary design focus was on correctness and data durability over raw speed. PostgreSQL built its legendary reputation for stability and ACID compliance, ensuring that data remained safe even when running on less reliable hardware.

Fast forward to today, where PostgreSQL runs on hardware with dozens of CPU cores, terabytes of ultra-fast NVMe storage and vast amounts of RAM (we even have half a tb of RAM available now). PostgreSQL recently introduced parallel query execution which breaks up complex queries and runs them simultaneously, gathering the results at the end. Modern PostgreSQL has also vastly improved its locking mechanisms, connection pooling solutions, and replication capabilities, evolving from a robust single-server database into a high-performance powerhouse that can scale horizontally and handle the massive, concurrent workloads of the modern internet.

While Postgres today does not yet have the modern CPU multi-threading, this is on the horizon, and Postgres 18 just added asynchronous i/o.

6) Utilize Codd’s relational model

At the height of the NoSQL movement in the late 2000s and early 2010s, a common story was told that relational databases were a relic of the past. With the rise of big and unstructured data, this old model may soon be cast out.

Postgres continued to do what it always has done and embraced its core strength - flexibility of data typing – and adopted some of NoSQL’s own ideas. Postgres introduced the JSON data type and then later the binary, indexable JSONB type. With this update, applications can now store schema-less API driven JSON data directly in a relational database and query it efficiently using a rich set of operators and functions. With features like json_table, you can go between arrays or traditional tables.

The newest revolution in the Postgres world seems to be the adoption of technologies to tie Postgres directly to unstructured flat files. Projects like pg_duckdb, pg_mooncake, and Crunchy Data Warehouse use custom extensions to work directly with files in csv, Parquet, and Iceberg directly in the data lake remote object stores where they reside. Even with the data abstracted to another location, Postgres’ relational model is still relevant, efficient, and trusted.

Summary

With Postgres’ flexibility - you can have a fully normalized, relational schema with foreign keys and JOINs, while also having an indexed JSONB document and full spatial geometry. We’re at a point in history where AI, science, and research are backed by a database that had no idea what the world would be like when it was built. Postgres is still here.

These original goals have had a profound impact on the project. Allowing for complexity and flexibility in a growing business landscape, while being easy to alter for individual use cases. And being ready for hardware (and cloud) technology that makes Postgres’ distribution even easier.

Get Excited About Postgres 18

Elizabeth.Christensen@crunchydata.com (Elizabeth Christensen) — Fri, 12 Sep 2025 08:00:00 EDT

Postgres 18 will be released in just a couple weeks! Here’s some details on the most important and exciting features.

Asynchronous i/o

Postgres 18 is adding asynchronous i/o. This means faster reads for many use cases. This is also part of a bigger series of performance improvements planned for future Postgres, part of which may be multi-threading. Expect to see more on this in coming versions.

What is async I/O?

When data isn’t in the shared memory buffers already, Postgres reads from disk, and I/O is needed to retrieve data. Synchronous I/O means that each individual request to the disk is waited on for completion before moving on to something else. For busy databases with a lot of activity, this can be a bottleneck.

Postgres 18 will introduce asynchronous I/O, allowing workers to optimize idle time and improve system throughput by batching reads. Currently, Postgres relies on the operating system for intelligent I/O handling, expecting OS or storage read-ahead for sequential scans and using features like Linux's posix_fadvise for other read types like Bitmap Index Scans. Moving this work into the database with asynchronous I/O will provide a more predictable and better-performing method for batching operations at the database level. Additionally, a new system view, pg_aios, will be available to provide data about the asynchronous I/O system.

Postgres writes will continue to be synchronous - since this is needed for ACID compliance.

If async i/o seems confusing, think of it like ordering food at a restaurant. In a synchronous model, you would place your order and stand at the counter, waiting, until your food is ready before you can do anything else. In an asynchronous model, you place your order, receive a buzzer, and are free to go back to your table and chat with friends until the buzzer goes off, signaling that your food is ready to be picked up.

Async I/O will affect:

sequential scans
bitmap heap scans (following the bitmap index scan)
some maintenance operations like VACUUM.

By default Postgres will turn on io_method = worker. By default there are 3 workers and this can be adjusted up for systems with larger CPU workers. I haven’t seen any reliable recommendations on this, so stay tuned for more on that from our team soon.

For Postgres running on Linux 5.1+ you can utilize the io_uring system calls and have the invocations made via the actual backends rather than having separate processes with the optional io_method = io_uring.

UUID v7

UUIDs are getting a bit of an overhaul in this version by moving to v7.

UUIDs are randomly generated strings which are globally unique and often used for primary keys. UUIDs are popular in modern applications for a couple reasons:

They’re unique: You can use keys generated from more than one place.
Decoupled:Your application can generate a primary key before sending the data to the database.
URL obscurity: If your URLs use primary keys (e.g., .../users/5), other URLs are easy to guess (.../users/6, .../users/7). With a UUID (.../users/f47ac10b-58cc-4372-a567-0e02b2c3d479), it's impossible to guess other IDs.

A new standard for UUID v7 came out in mid-2024 via a series of standards updates. UUIDv4 was the prior version of uuid with native Postgres support. But sorting and indexing in large tables had performance issues due to the relative randomness, leading to fragmented indexes and bad locality. UUIDv7 helps with the sort and indexing issues. It is still random but that first 48 bits (12 characters) are a timestamp, and the remaining bits are random; this gives better locality for data inserted around the same time and thus better indexability.

The timestamp part is a hexadecimal value (i.e. compressed decimal). So for example a uuid that begins with 01896d6e4a5d6 (hex) would represent the 2707238289622 (decimal) and that is the number of milliseconds since 1970.

This is how the DDL will look for uuid v7:

CREATE TABLE user_actions (
action_id UUID PRIMARY KEY DEFAULT uuidv7(),
user_id BIGINT NOT NULL,
action_description TEXT,
action_time TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_action_id ON user_actions (action_id);

B-tree skip scans

There’s a nice performance bump coming in Postgres 18 for some multi-column B-tree indexes.

In Postgres, if you have an index on columns (status, date) in a table, this index can be used to match queries which query both status and date fields, or just status.

In Postgres 17 and below, this same index cannot be used to answer queries against just the date field; you would have to have that column indexed separately or the database would resort to a sequence scan + filter approach if there were no appropriate indexes for that table.

In Postgres 18, in many cases it can automatically use this multi-column index for queries touching only the date field. Known as a skip scan, this lets the system "skip" over portions of the index.

This works when queries don’t use the leading columns in the conditions and the omitted column has a low cardinality, like a small number of distinct values. The optimization works by:

Identifying all the distinct values in the omitted leading column(s).
Effectively transform the query to add the conditions to match the leading values.
The resulting query is able to use existing infrastructure to optimize lookups across multiple leading columns, effectively skipping any pages in the index scan which do not match both conditions.

For example, if we had a sales table with columns status and date, we might have a multi-column index:

CREATE INDEX idx_status_date
ON sales (status, date);

An example query could have a where clause that doesn’t include status.

SELECT * FROM sales
WHERE date = '2025-01-01';

Nothing in the query plan tells you this is a skip scan, so you’ll end up with a normal Index scan like this, showing you the index conditions.

                                QUERY PLAN
-------------------------------------------------------------
 Index Only Scan using idx_status_date on sales  (cost=0.29..21.54 rows=4 width=8)
   Index Cond: (date = '2025-01-01'::date)
(2 rows)

Before 18, a full table scan would be done, since the leading column of the index is not included, but with skip scan Postgres can use the same index for this index scan.

In Postgres 18, because status has a low cardinality and just a few values, a compound index scan can be done. Note that this optimization only works for queries which use the = operator, so it will not work with inequalities or ranges.

This all happens behind-the-scenes in the Postgres planner so you don’t need to turn it on. The idea is that it will benefit analytics use cases where filters and conditions often change and aren’t necessarily related to existing indexes.

The query planner will decide if using a skip scan is worthwhile, based on the table's statistics and the number of distinct values in the columns being skipped.

Generated columns on-the-fly

PostgreSQL 18 introduces virtual generated columns. Previously, generated columns were always stored on disk. This meant for generated columns, values were computed at the time of an insert or update and adding a bit of write overhead.

In PostgreSQL 18, virtual generated columns are now the default type for generated columns. if you define a generated column without explicitly specifying STORED, it will be created as a virtual generated column.

CREATE TABLE user_profiles (
user_id SERIAL PRIMARY KEY,
settings JSONB,
username VARCHAR(100) GENERATED ALWAYS AS (settings ->> 'username') VIRTUAL
);

This is a great update for folks using JSON data, queries can be simplified and data changes or normalization can be done on the fly as needed.

Note that virtual generated columns are not indexable - since they’re not stored on disk. For indexing of JSONB, use the stored version or expression index.

OAUTH 2.0

Good news for folks that use Okta, Keycloak, and other managed authentication services, Postgres is now compatible with OAUTH 2.0. This is specified in the main host based authentication configuration (pg_hba.conf) file.

The Oauth system uses bearer tokens where the client application presents a token instead of a password to prove identity. The token is an opaque string and its format is determined by the authorization server. This feature removes the need to store passwords in the database. It also allows for more robust security measures like multi-factor authentication (MFA) and single sign-on (SSO) to be managed by external identity providers.

Postgres versions are packed with other improvements

Postgres 18 comes with a staggering 3,000 commits from more than 200 authors. While many of these are features, there are numerous additions and optimizations under the hood to the Postgres query planner and other parts of the system that are behind the scenes. Even if you don’t utilize optional features, there’s still performance benefits (uh ... asyc i/o is a biggie), bug fixes, and security patches that make upgrading on a regular cadence a good idea.

Postgres Logging for Performance Optimization

Elizabeth.Christensen@crunchydata.com (Elizabeth Christensen) — Tue, 19 Aug 2025 08:00:00 EDT

A modern-day Postgres instance creates robust and comprehensive logs for nearly every facet of database and query behavior. While Postgres logs are the go-to place for finding and debugging critical errors, they are also a key tool in application performance monitoring.

Today let’s get set up with logging for Postgres - starting with the basics of what to log, how to log what you want, and as reward for your hard work - how to use these to monitor and improve performance. The Postgres docs on logs are excellent, so please consult those for the most up to date and comprehensive configurations. This blog reads between the lines a bit beyond the docs to offer some practical advice and settings. As always, your mileage may vary.

Let’s get into it and talk about:

Setting up logging levels, logging SQL, rotating logs
Logging for performance monitoring
Extracting and parsing logs into usable information

WAL note: This post is limited to the server’s message and error logging - not the transaction Write Ahead Log (WAL). While that is a log, the purpose of WAL is to log all data and schema changes for backups, disaster recovery, and replication streaming.

Initiating logging for Postgres

First thing, out of the box Postgres just sends logs to the terminal. To turn on sending things to log files, turn on the logging collector.

logging_collector = on

What file format do you want for logs?

The log message formatting is determined by log_destination parameter, which can be set to one or more of: stderr, csvlog, jsonloglog, and syslog. stderr is the default. Use commas to separate values when using more than one logging destination:

-- setting multiple log destinations
log_destination = 'stderr,json'

If logging_collector = 'on', then stderr, csvlog, and jsonlog logging will go to files in the directory specified by log_directory, with csv and json requiring that the logging collector be enabled.

There are many reasons why logs may be written in multiple file formats. Many hosted and fully managed systems will have logs available in different formats for use by different tools. For Crunchy Bridge we have the live logs and log tail in our CLI of the syslog destination. We often use jsonlog for all of our internal logs. syslog is what you’ll have on servers for using the log shipping to an external logging host.

What level of logging do you want?

All log messages generated by the server will have one of these severity levels:

PANIC: Severe errors — system must shut down and recover.
FATAL: Errors that cause the session to terminate. LOG : Server events such as checkpoints.
ERROR: Errors that abort the current operation but not the session.
WARNING: Potential problems — deprecated features, possible data issues
NOTICE: Significant messages — non-critical issues like “table doesn’t exist.”
INFO: Low informational messages — autovacuum, config reloads.
DEBUG1-5: Basic debug to most verbose

Log messages will look like this :

-- background worker crash
ERROR:  background worker "logical replication launcher" crashed

--disk i/o
ERROR:  could not fsync file "pg_wal/0000000100000000000000A3": Input/output error

--out of disk space for temp files
ERROR:  could not create temporary file: No space left on device

--vacuum warning
WARNING:  relation "public.large_table" contains more than "autovacuum_vacuum_threshold" dead tuples

The log_min_messages server setting is use to determine which log messages are actually logged to the configured log destination(s). All messages with the configured severity level or above will be sent. The default is error and that’s a generally good setting. Warning may also be useful for debugging.

log_min_messages='warning'

So WARNING includes all messages with warning, error, log, fatal, and panic severities. In general, the debug levels would only be used in development or for a specific purpose.

Logging SQL statements

In addition to log severity selection covered above, SQL queries can be selected for logging based on the log_statement parameter. The values you can choose from are:

none - Log nothing. This will by default not log SQL statements, but if there are warnings or errors, those will still appear with log_min_message configuration.
ddl - Log data definition changes only, so this will log any change to table definitions, columns, and indexes.
mod - Log modification to data including all DDL plus inserts, updates, and deletes.
all - Log every SQL statement, query, and all DDL (note: this is generally not recommended for production).

DDL is a good choice for production.

log_statement = 'ddl';

Statements with syntax errors or that fail during their parsing or planning stages are not covered by log_statement. These are covered by log_min_error_statement, which should be set to ERROR or lower to log them.

log_min_error_statement=ERROR

SQL errors will look like this, with the HINT line appearing when relevant. If you’re logging the actual statement with log_min_error_statement = 'error', that appears last.

2025-05-09 14:02:37 UTC [28561] ERROR:  operator does not exist: integer == integer at character 33
2025-05-09 14:02:37 UTC [28561] HINT:  Perhaps you meant to use the standard operator "=".
2025-05-09 14:02:37 UTC [28561] STATEMENT:  SELECT * FROM users WHERE id == 42;

Logging of prepared statements & sensitive data

One common concert for many is ensuring that sensitive data such as credit card numbers or PII is not included in query data that is logged. The log_parameter_max_length and log_parameter_max_length_on_error parameters allow you to limit the length of prepared statement bind parameter values logged with query and error log messages for prepared statements, respectively, to the specified number of bytes. This will apply to bind parameters of both explicit, named prepared statements run with PREPARE / EXECUTE and the “unnamed” prepared statements run by application database drivers that use the extended query protocol.

The default values of -1 for these will log all bind parameters in full. Set them to 0 to fully disable bind parameter logging.

log_parameter_max_length = 0
log_parameter_max_length_on_error = 0

If you only need this done for certain queries or transactions these can also be set on-the-fly with SET SESSION and SET LOCAL, or they can be set for all queries for a given user with ALTER USER, on a given database with ALTER DATABASE, and even for all queries for a given user on a specific database.

# set for an entire session
SET SESSION log_parameter_max_length = 0;
SET SESSION log_parameter_max_length_on_error = 0

# set for a transaction
BEGIN;
SET LOCAL log_parameter_max_length = 0;
SET LOCAL log_parameter_max_length_on_error = 0;
...
COMMIT;

# set for all queries run by user bob
ALTER ROLE bob SET log_parameter_max_length = 0;
ALTER ROLE bob SET log_parameter_max_length_on_error = 0;

# set for all traffic on database pii_db
ALTER DATABASE pii_db SET log_parameter_max_length = 0;
ALTER DATABASE pii_db SET flog_parameter_max_length_on_error = 0;

# set for all queries run by bob on the pii_db
ALTER ROLE bob IN DATABASE SET og_parameter_max_length = 0;
ALTER ROLE bob IN DATABASE SET log_parameter_max_length_on_error = 0;

Formatting the log entries

Out of the box, Postgres log entries look like this:

2025-05-19 13:49:04.908 EDT [3108283] ERROR: column "asdfklasdf" does not exist at character 8

The timestamp and process ID portion come from the default log_line_prefix:

log_line_prefix = '%m [%p] '

We often recommend folks use a better prefix to give more context for what generated the log message.

log_line_prefix = '%m [%p]%q %u@%d '

If you set a prefix, make sure to keep the process ID (%p) as that is a huge huge help when troubleshooting a specific process to find/stop. %u will add the user and %d the database which is helpful if you’re using more than the postgres db in a single instance.

See the log_line_prefix documentation for a full list of valid printf-style % escape sequences.

The log_error_verbosity setting is used to determine how verbose the log message itself is.

terse shortens errors with a SQL state error code and short logs
default which includes the error and hint messages
verbose which includes additional error context like sources and function names. This is not recommended for production but could be a good useful setting for development.

log_error_verbosity = 'default'

Audit logging

Beyond the server and query logs, you can also audit user behavior with the PGAudit extension. PGAudit is not a core extension that ships with Postgres itself, but there are packages for it in the repositories for all major OS distributions.

Use of the PGAudit extension requires that pgaudit is in shared_preload_libraries, the extension is created in each database where auditing is needed, and pgaudit.log is set to a value other than none.

-- add to preloaded libraries
shared_preload_libraries = 'pgaudit'

-- add extension
CREATE EXTENSION pgaudit

-- enable the pgaudit.log
pgaudit.log = ddl

Audit Logging records more granular data, such as who performed the action, when it happened, what the exact changes were. This allows for tracking specific user actions, including inserts, updates, deletes, and administrative commands. Possible values for the pgaudit log are: read, write, role, ddl, misc.

ALTER ROLE audited_user SET pgaudit.log = 'read, write, ddl';

Audit logs look like this and are comma separated (CSV)

2025-05-09 12:34:56.789 UTC [12345] myuser@mydb LOG:  AUDIT: SESSION,1,SELECT,pg_catalog.pg_stat_activity,SELECT * FROM pg_stat_activity;

If you noticed that the regular logs and audit logs overlap each other, you’re right. pgAudit provides detailed audit logging (including session-level information, roles, and changes) in addition to Postgres’s built-in logging. If you only need to log DDL statements and you're not interested in the additional auditing features provided by pgAudit, then log_statement = 'ddl' may be sufficient.

For Crunchy Bridge users, we audit everything except the application user role. So by default your main Postgres role and generated user roles are fully audited.

Log file naming and locations

The log_filename setting is used to specify the format of the log filenames using strftime escape patterns. By default the log file name has postgresql with a timestamp.

log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'

The default goes to hour, minute, second which is probably unnecessary so this is a good simple one:

log_filename = 'postgresql-%Y-%m-%d'

stderr log files will have a .log suffix, csvlog files will have a .csv suffix, and jsonlog files will have a .json suffix.

Log files written by Postgres for stderr, csvlog, and jsonlog logging are located in the directory specified by log_directory. This can be either a full absolute path or a relative path that will be relative to the data_directory path. The location of log files written to syslog will be up to the system’s syslog configuration.

-- where on the host is the data directory
SHOW data_directory;
-- where on the host is the log directory
SHOW log_directory;
-- what do the log file names look like
SHOW log_filename;
-- exact location of the current log file
SELECT pg_current_logfile();

Rotating logs

Now we have set up some logs …. BUT if you don’t set up rotation you will just fill up your disk with logs.

This sets the rotation at 1 day.

log_rotation_age = '1d'

This sets a file size for rotation, in case its bigger than 10MB before that 1 day is up.

log_rotation_size = '10MB'

If a log_filename format specification would result in the re-use of log filename, e.g. ‘postgresql-Mon.logwill be used on every Monday,log_truncation_on_rotationwill cause the given log to be truncated before each subsequent use. Iflog_truncate_on_rotation` is not enabled then the existing log will be appended to instead of truncated.

log_truncate_on_rotation = 'on'

If a log_filename format is used that does not result in automatic file name re-use, e.g. postgresql-%Y-%m-%d.log, it is recommended to use an external log rotation tool such as Linux’s logrotate to handle removing old logs as needed, perhaps after they have been archived to a separate long-term storage location, to avoid excessive disk space usage.

Troubleshooting with logs

Now that you have some proper logging set up, you can use this to pinpoint particular system problems. Usually working with Postgres logs looks something like this:

Someone notices a big problem - either things are slow, down, alarm bells are sounding, etc.
Check the metrics - is there a high CPU spike? I/O spike? Pinpoint that to a time window, the more specific the better.
Search the logs for that time window. Looking for errors, locks, or any sign of what’s wrong.
Assuming it’s a specific process, find the PID (process id) of the issue, and work from there. If it’s a query or a lock, try to kill that. If its a large job slowing everything down, start working through those issues.

Logging for performance bottlenecks in logs

Ok, if you’re still with me here, you’re up to the point where we have some logs, we know what they look like and what they’re used for and we can use them to troubleshoot critical errors. But we’re good little DBA boys and girls, so we never have errors. We just want logs to make our queries faster. So let’s do that.

Logging long running queries

If you would like to capture information about queries that run longer than some time period, you can configure that using the log_min_duration_statement parameter. This is PostgreSQL’s slow query logging threshold so it is helpful in particular for debugging long-running queries.

As you start working on query performance, logging the slowest queries is a great way to see what isn’t efficient.

log_min_duration_statement = '1s'

-- example log for a query that took 1000+ seconds
LOG:  duration: 2001.342 ms  statement: SELECT count(*) from orders;

Logging locks and lock waits

You can log any time your query is waiting on a lock by turning on log_lock_waits. Lock waits in your logs can be a good indicator that processes are being contentious. There is virtually no overhead on enabling this and it’s very safe for production databases. This is set to “on” by default on Crunchy Bridge clusters:

log_lock_waits = 'on'

With lock wait logging enabled the deadlock_timeout setting is used as the threshold for logging lock waits, e.g. with deadlock_timeout = '1s' any query that waits for a lock for 1s or longer will have that wait logged.

Logged lock waits will look like this:

2024-05-16 14:45:12.345 UTC [45678] user@database LOG:  process 45678 still waiting for ShareLock on transaction 123456 after 1000.001 ms
2024-05-16 14:45:12.345 UTC [45678] user@database DETAIL:  Process holding the lock: 12345. Wait queue: 45678, 45670.
2024-05-16 14:45:12.345 UTC [45678] user@database STATEMENT:  UPDATE orders SET status = 'shipped' WHERE id = 42;

There we can see:

The pid of the process this lock wait entry is for
The pid of the process that is holding the lock
A list of pids for all processes that are waiting for the lock in the order they have requested it
The query that the process this entry is for is running that needs the lock

Logging temp files

Effcience Postgres memory useage is a huge part of fast and snappy database operations. If your queries are having to perform reads or sorts on disk, instead of in the memory buffer, it might mean you need to increase work_mem, expand your memory capacity in some way. Adding indexes or query rewrites can also reduce the amount of data given queries need process to return their results.

A very common way to log memory performance is to log when Postgres creates temp files. By default this is turned off, -1. A setting of 0 will record in your Postgres logs whenever a temporary file is created - generally not a good idea.

Ideally you set the log_temp_files to be the same size as your working memory, work_mem. work_mem is the memory limit per operation before Postgres needs to spill to disk. If an operation fits within work_mem, the system won’t create temp files, so no logging needed. If the operation spills to disk, it creates a temp file at least as big as work_mem. So here’s you’re logging whenever a temp file exceeds the size of the memory allowed for one operation*.*

-- log temp files >4mb in kb, set to current work_mem setting
log_temp_files = '4096'

The actual logs for temp files will look like this

2024-05-16 14:23:05.123 UTC [12345] user@database LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp1234.0", size 245760
2024-05-16 14:23:05.123 UTC [12345] user@database DETAIL:  Sort operation used temporary file because work_mem was exceeded

Query logging with auto_explain

auto_explain is a Postgres extension that automatically logs EXPLAIN plans of queries, which is useful for debugging and performance tuning. Auto_explain ships with Postgres but it has to be turned on explicitly.

-- add to preloaded libraries

-- create the extension
CREATE EXTENSION IF NOT EXISTS auto_explain;

-- restart Postgres after this

You can set up auto_explain to log queries of various durations

-- Log plans for queries that run longer than 1000ms
auto_explain.log_min_duration = '1000ms';

There’s other settings for auto_explain to show buffers and a couple other things so see the auto_explain docs. Auto_explain generates a lot of logs so proceed with caution here. For really big queries or queries on partitioned tables, these plans can be really long. An alternative to auto_explain all the time is setting it for a single session.

Auto_explain logs will look something like this:

LOG:  duration: 1008.035 ms  plan:
May 17 02:42:06 z7j4asvir5dufokh5hpzoy postgres[43712]: [29-2]
Query Text: select count(*) from page_hits limit 1000;

Autovacuum logging

Whether or not a given autovacuum job is logged is controlled by the log_autovacuum_min_duration parameter, which defaults to 10 minutes since PG15 and was disabled by default in previous versions with a value of -1. As autovacuum’s log entries contain all of the same information seen in the output of manual VACUUM VERBOSE commands many folks will lower this to a second or two a detailed record of the work done by the autovacuum daemon, or often all the way down to 0 seconds to log all of its work.

log_autovacuum_min_duration = '1s'

Here is an example of what this logs for an autovacuum VACUUM operation since PostgreSQL major version 15:

[3506673][autovacuum worker][501/2614][0] LOG:  automatic vacuum of table &quot;testdb.public.pgbench_accounts&quot;: index scans: 1
        pages: 0 removed, 327869 remain, 81969 scanned (25.00% of total)
        tuples: 0 removed, 14769015 remain, 2000000 are dead but not yet removable
        removable cutoff: 929, which was 3 XIDs old when operation ended
        new relfrozenxid: 929, which is 11 XIDs ahead of previous value
        frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
        index scan needed: 49181 pages from table (15.00% of total) had 2999999 dead item identifiers removed
        index &quot;pgbench_accounts_pkey&quot;: pages: 54840 in total, 8224 newly deleted, 8224 currently deleted, 0 reusable
        I/O timings: read: 174.219 ms, write: 0.000 ms
         avg read rate: 26.491 MB/s, avg write rate: 22.489 MB/s
         buffer usage: 276192 hits, 41175 misses, 34955 dirtied
         WAL usage: 123002 records, 57432 full page images, 75538789 bytes
         system usage: CPU: user: 0.64 s, system: 0.27 s, elapsed: 12.14 s

And here is what they look like in prior major versions:

[17656][autovacuum worker][5/463][0] LOG:  automatic vacuum of table &quot;testdb.public.pgbench_accounts&quot;: index scans: 1
        pages: 0 removed, 327869 remain, 0 skipped due to pins, 0 skipped frozen
        tuples: 0 removed, 14740860 remain, 2000000 are dead but not yet removable, oldest xmin: 760
        index scan needed: 49181 pages from table (15.00% of total) had 2999999 dead item identifiers removed
        index &quot;pgbench_accounts_pkey&quot;: pages: 54840 in total, 8224 newly deleted, 8224 currently deleted, 0 reusable
        I/O timings: read: 488.030 ms, write: 238.542 ms
        avg read rate: 55.609 MB/s, avg write rate: 21.009 MB/s
        buffer usage: 192958 hits, 124428 misses, 47008 dirtied
        WAL usage: 122981 records, 0 full page images, 19019531 bytes
         system usage: CPU: user: 1.14 s, system: 0.80 s, elapsed: 17.48 s

Extracting and parsing logs

For most folks with large applications, it is recommended to do something with your logs rather than just toss them aside. Having your logs accessible and searchable is critical for bug fixing and really helpful as you’ve seen above for performance. Logs are a little like insurance. You may not need them every day, but when you have a problem, you’re glad they’re there.

pgBadger

There’s an open source project for analyzing Postgres logs called pgBadger. If you have spent any time looking through logs by hand, this project is going to seem like magic. It will turn Postgres logging output into an html page with zoomable charts.

Most folks run periodically or on demand to analyze PostgreSQL logs and generate HTML reports.

If you’re using a managed cloud Postgres service, if you can get logs out in a file, you can use pgBadger. On Crunchy Bridge you can tail logs to pgBadger like this:

--send the CLI logs to a local text file
cb logs qzyqhjdg3focnta3zvleomq > pglogs.txt

-- pgBadger reads the text file and provides html output
pgbadger -f syslog pglogs.txt -o out.html

3rd party log drain tools

There’s quite a few folks out there that are happy to host, parse, and let you search your logs. We work with customers that are quite happy with pgAnalyze, Datadog, Honeybadger, and many others. These may run as an agent, small pod container, or another service to export the logs. For folks using a cloud host, these are a really good idea.

--set up syslog drain
log_destination = 'syslog';

Data warehousing for logs

It is probably not surprise that logs today for a large scale application require their own management. Some teams are opting to do their own hosting and querying of logs. Systems like Snowflake, Clickhouse, Crunchy Data Warehouse, and many others can offer a SQL based storage and query engine for high throughput logs. When stored as flat files in object storage, these logging systems can be very cost effective.

Summary of recommended logging configurations

Here’s a big summary of the settings for everything I covered today. But guys …. it depends …. it depends on a lot of things, so review the docs and your individual application needs. Don’t just copy paste this in.

-- Set up logging collector
ALTER SYSTEM SET logging_collector = 'on';

-- log system error messages
ALTER SYSTEM SET log_min_messages='error'

-- log all data definition changes
ALTER SYSTEM SET log_statement = 'ddl';

-- log the full statement for sql errors
ALTER SYSTEM SET log_min_error_statement= 'ERROR';

-- set log file name
ALTER SYSTEM SET log_filename = 'postgres-%Y-%m-%d';

-- add database name and process id to log prefix
ALTER SYSTEM SET log_line_prefix = '%m [%p] %q%u@%d ';

-- rotate logs every day
ALTER SYSTEM SET log_rotation_age = '1d'

-- enable the pgaudit.log
ALTER SYSTEM SET pgaudit.log = 'ddl';

-- log queries longer than 1000ms
ALTER SYSTEM SET log_min_duration_statement = '1000';

-- Log lock waits
ALTER SYSTEM SET log_lock_waits = 'on';

-- log temp files, when postgres needs disk instead of cache, set to your work_mem
ALTER SYSTEM SET log_temp_files = '4096';

-- Log plans for queries that run longer than 1000ms
ALTER SYSTEM SET auto_explain.log_min_duration = '1000ms';

-- Set up some kind of log destination to a place to search logs
ALTER SYSTEM set log_destination = 'syslog';

Final thoughts

The first thing I hope you learned today is that Postgres has a lot of logs - and a lot of flexibility in how to use them. The second thing, is hopefully you’ll tune your logs so you’re keeping some details to help you with your database. Archiving your logs is really helpful and if you’re on a hosted system, set up a log drain. This ensures your logs are ready to search when you need them.

The flip side to my plea to set up logging is that they are expensive. Logs can easily fill up your disk and waste quite a bit of your company’s hard earned profits if you aren’t careful. So its a tradeoff. Keep logs that are helpful. Have something setup to search them. But don’t keep logs if you’re not actively using them. Set up rotation so they’re thrown after after a few days or get to a certain size.

When you’re actively working on performance, pay attention to slow queries and log those full queries and query plans with auto_explain. Log temp files so you know when Postgres isn't able to use the cache.

Set up logging for your development and staging environments. This will give you chance to find bugs, check on query plans, without breaking the bank on production logging.

Moderation is the key to both after-work happy hour …. and Postgres logging. Cheers!