CrunchyData Blog

Archive Postgres Partitions to Iceberg

Craig.Kerstiens@crunchydata.com (Craig Kerstiens) — Wed, 21 May 2025 10:00:00 EDT

Postgres comes with built-in partitioning and you can also layer in for pg_partman for additional help with maintenance of your partitioning. It works quite well for partitioning your data to make it easy to retain a limited set of data and improve performance if your primary workload is querying a small time series focused subset of data. Oftentimes, when implementing partitioning you only keep a portion of your data then drop older data as it ages out for cost management.

But what if we could move old partitions seamlessly to Iceberg that could retain all our data forever, while only maintaining recent partitions within Postgres? Could we have a perfect world of full long term copy in Iceberg easily query-able from a warehouse, but Postgres still functioning as the operational database with the most recent 30 days of data?

With the latest replication support for Crunchy Data Warehouse this works seamlessly, lets dig in.

First lets setup our partitioning

If you’d like to follow along at home, here’s some code to set up a sample set of partitioned data resembling a web analytics data set.

CREATE TABLE page_hits (
    id SERIAL,
    site_id INT NOT NULL,
    ingest_time TIMESTAMPTZ NOT NULL,
    url TEXT NOT NULL,
    request_country TEXT,
    ip_address INET,
    status_code INT,
    response_time_msec INT,
    PRIMARY KEY (id, ingest_time)
) PARTITION BY RANGE (ingest_time);

This function will create a set of partitions for us for the last 30 days.

DO $$
DECLARE
  d DATE;
BEGIN
  FOR d IN SELECT generate_series(DATE '2025-04-20', DATE '2025-05-19', INTERVAL '1 day') LOOP
    EXECUTE format($f$
      CREATE TABLE IF NOT EXISTS page_hits_%s PARTITION OF page_hits
      FOR VALUES FROM ('%s') TO ('%s');
    $f$, to_char(d, 'YYYY_MM_DD'), d, d + INTERVAL '1 day');
  END LOOP;
END $$;

Your database should look something like this:

                            List of relations
 Schema |          Name           |       Type        |       Owner
--------+-------------------------+-------------------+-------------------
 public | page_hits               | partitioned table | postgres
 public | page_hits_2025_04_20    | table             | postgres
 public | page_hits_2025_04_21    | table             | postgres
...
 public | page_hits_2025_05_18    | table             | postgres
 public | page_hits_2025_05_19    | table             | postgres
 public | page_hits_id_seq        | sequence          | postgres

Now we can generate some sample data. In this case we’re going to generate 1000 rows per day for each of our tables:

DO $$
DECLARE
  d DATE;
BEGIN
  FOR d IN
    SELECT generate_series(DATE '2025-04-20', DATE '2025-05-19', '1 day'::INTERVAL)
  LOOP
    INSERT INTO page_hits (site_id, ingest_time, url, request_country, ip_address, status_code, response_time_msec)
    SELECT
        (RANDOM() * 30)::INT,
        d + (i || ' seconds')::INTERVAL,
        'http://example.com/' || substr(md5(random()::text), 1, 12),
        (ARRAY['China', 'India', 'Indonesia', 'USA', 'Brazil'])[1 + (random() * 4)::INT],
        inet '10.0.0.0' + (random() * 1000000)::INT,
        (ARRAY[200, 200, 200, 404, 500])[1 + (random() * 4)::INT],
        (random() * 300)::INT
    FROM generate_series(1, 1000) AS s(i);
  END LOOP;
END $$;

Now that we have some data within our Postgres setup lets connect things to our Crunchy Data Warehouse and get them replicated over.

Set up replication to Iceberg

Within the setup you want to specify to publish via the root partition - root=true. This keeps partitions in Postgres but does not partition Iceberg since it has its own organization of data files.

CREATE PUBLICATION hits_to_iceberg
FOR TABLE page_hits
WITH (publish_via_partition_root = true);

Set up the replications users

-- create a new user
CREATE USER replication_user WITH REPLICATION PASSWORD '****';

-- grant appropriate permissions
GRANT SELECT ON ALL TABLES IN SCHEMA public TO replication_user;

And on the warehouse end, subscribe to the originating data. Since we’ve specified the create_tables_using Iceberg, this data will be stored in Iceberg.

CREATE SUBSCRIPTION http_to_iceberg
CONNECTION 'postgres://replication_user:****@p.qzyqhjdg3fhejocnta3zvleomq.db.postgresbridge.com:5432/postgres?sslmode=require'
PUBLICATION hits_to_iceberg
WITH (create_tables_using = 'iceberg', streaming, binary, failover);

And here’s the Iceberg table.

                          List of relations
 Schema |          Name           |     Type      |       Owner
--------+-------------------------+---------------+-------------------
 public | page_hits               | foreign table | postgres

Now query data stored in Iceberg from Postgres

Here we can see the daily traffic insights for each country, breaking down the number of hits, success rate, average response time, and top error codes:

SELECT
  date_trunc('day', ingest_time) AS day,
  request_country,
  COUNT(*) AS total_hits,
  ROUND(100.0 * SUM(CASE WHEN status_code = 200 THEN 1 ELSE 0 END) / COUNT(*), 2) AS success_rate_percent,
  ROUND(AVG(response_time_msec), 2) AS avg_response_time_msec,
  MODE() WITHIN GROUP (ORDER BY status_code) AS most_common_status
FROM
  page_hits
GROUP BY
  day, request_country
ORDER BY
  day, request_countr

          day           | request_country | total_hits | success_rate_percent | avg_response_time_msec | most_common_status
------------------------+-----------------+------------+----------------------+------------------------+--------------------
 2025-04-20 00:00:00+00 | Brazil          |        128 |                68.75 |                 146.83 |                200
 2025-04-20 00:00:00+00 | China           |        138 |                65.94 |                 145.67 |                200
 2025-04-20 00:00:00+00 | India           |        245 |    64.90000000000001 |                  153.8 |                200
 2025-04-20 00:00:00+00 | Indonesia       |        230 |    64.34999999999999 |                 151.43 |                200

Now drop the older Postgres partition

Since data is replicated and a copy is in Iceberg, we can drop partitions at a specific time to free up storage and memory on our main operational Postgres database.

--drop partition
DROP TABLE page_hits_2025_04_20;

-- show missing partition in the table list
                            List of relations
 Schema |          Name           |       Type        |       Owner
--------+-------------------------+-------------------+-------------------
 public | page_hits               | partitioned table | postgres
 public | page_hits_2025_04_21    | table             | postgres
 public | page_hits_2025_04_22    | table             | postgres

-- query iceberg, data is still there
          day           | request_country | total_hits | success_rate_percent | avg_response_time_msec | most_common_status
------------------------+-----------------+------------+----------------------+------------------------+--------------------
 2025-04-20 00:00:00+00 | Brazil          |        128 |                68.75 |                 146.83 |                200
 2025-04-20 00:00:00+00 | China           |        138 |                65.94 |                 145.67 |                200
 2025-04-20 00:00:00+00 | India           |        245 |    64.90000000000001 |                  153.8 |                200
 2025-04-20 00:00:00+00 | Indonesia       |        230 |    64.34999999999999 |                 151.43 |                200

Summary

Here’s the recipe for simple Postgres archiving with long term cost effective data retention:

1 - Partition your high throughput data - this is ideal for performance and management anyways.

2 - Replicate your data to Iceberg for easy reporting and long term archiving.

3 - Drop partitions at the ideal interval.

4 - Continue to query archived data from Postgres.

Postgres Partitioning with a Default Partition

Keith.Fiske@crunchydata.com (Keith Fiske) — Fri, 06 Dec 2024 08:30:00 EST

Partitioning is an important database maintenance strategy for a growing application backed by PostgreSQL. As one of the main authors of pg_partman and an engineer here at Crunchy Data, I spend a lot of my time helping folks implement partitioning. One of the nuances of PostgreSQL’s partitioning implementation is the default partition, which I’ll dig into in this post and discuss how to use it effectively.

Why default partitions are important

The default partition is pretty much what it sounds like; you can make a special partition designated as the DEFAULT, which will capture any and all data that does not have an existing partition with matching boundary constraints.

If you’re new to partitioning, you might be making partitions a week in advance. But after monitoring you realize you need to make them more like 2 weeks in advance. Default partitions can help you learn how to manage and when to create your child partitions.

Default partitions are also there to catch mistakes. Maybe there’s an issue in application code putting timestamps a hundred years into the future instead of one year. Maybe there’s just some bad data getting created. Your default partition can help you spot that.

While having a default partition is a good idea, you don’t actually want to leave data in there. I’ll show you some tips later on about how to monitor the default for the presence of any rows. When you find data in there, you’ll want to evaluate whether the data is valid, and if it is, create the relevant child partitions and move the data there.

Adding a default

PostgreSQL declarative partitioning does not create any child partitions automatically, including the default. pg_partman can help with that and we’ll discuss that later.

Here we have a daily partition set that has been created but does not yet have a default.

                                      Partitioned table "partman_test.time_taptest_table"
 Column |           Type           | Collation | Nullable |    Default    | Storage  | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------------+----------+-------------+--------------+-------------
 col1   | integer                  |           |          |               | plain    |             |              |
 col2   | text                     |           |          | 'stuff'::text | extended |             |              |
 col3   | timestamp with time zone |           | not null | now()         | plain    |             |              |
Partition key: RANGE (col3)
Partitions: partman_test.time_taptest_table_p20241118 FOR VALUES FROM ('2024-11-18 00:00:00-05') TO ('2024-11-19 00:00:00-05'),
            partman_test.time_taptest_table_p20241119 FOR VALUES FROM ('2024-11-19 00:00:00-05') TO ('2024-11-20 00:00:00-05'),
            partman_test.time_taptest_table_p20241120 FOR VALUES FROM ('2024-11-20 00:00:00-05') TO ('2024-11-21 00:00:00-05'),
            partman_test.time_taptest_table_p20241121 FOR VALUES FROM ('2024-11-21 00:00:00-05') TO ('2024-11-22 00:00:00-05'),
            partman_test.time_taptest_table_p20241122 FOR VALUES FROM ('2024-11-22 00:00:00-05') TO ('2024-11-23 00:00:00-05'),
            partman_test.time_taptest_table_p20241123 FOR VALUES FROM ('2024-11-23 00:00:00-05') TO ('2024-11-24 00:00:00-05'),
            partman_test.time_taptest_table_p20241124 FOR VALUES FROM ('2024-11-24 00:00:00-05') TO ('2024-11-25 00:00:00-05'),
            partman_test.time_taptest_table_p20241125 FOR VALUES FROM ('2024-11-25 00:00:00-05') TO ('2024-11-26 00:00:00-05'),
            partman_test.time_taptest_table_p20241126 FOR VALUES FROM ('2024-11-26 00:00:00-05') TO ('2024-11-27 00:00:00-05'),

If you try to insert data for 2024-12-25, that will be outside the existing child partition boundaries, you will get an error, and the data is lost.

INSERT INTO partman_test.time_taptest_table (col3) VALUES ('2024-12-25'::date);
ERROR:  no partition of relation "time_taptest_table" found for row
DETAIL:  Partition key of the failing row contains (col3) = (2024-12-25 00:00:00-05).

Adding a DEFAULT partition is very easy:

CREATE TABLE partman_test.time_taptest_table_default PARTITION OF partman_test.time_taptest_table DEFAULT;

\d+ partman_test.time_taptest_table
                                      Partitioned table "partman_test.time_taptest_table"
 Column |           Type           | Collation | Nullable |    Default    | Storage  | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------------+----------+-------------+--------------+-------------
 col1   | integer                  |           |          |               | plain    |             |              |
 col2   | text                     |           |          | 'stuff'::text | extended |             |              |
 col3   | timestamp with time zone |           | not null | now()         | plain    |             |              |
Partition key: RANGE (col3)
Partitions: partman_test.time_taptest_table_p20241118 FOR VALUES FROM ('2024-11-18 00:00:00-05') TO ('2024-11-19 00:00:00-05'),
            partman_test.time_taptest_table_p20241119 FOR VALUES FROM ('2024-11-19 00:00:00-05') TO ('2024-11-20 00:00:00-05'),
            partman_test.time_taptest_table_p20241120 FOR VALUES FROM ('2024-11-20 00:00:00-05') TO ('2024-11-21 00:00:00-05'),
            partman_test.time_taptest_table_p20241121 FOR VALUES FROM ('2024-11-21 00:00:00-05') TO ('2024-11-22 00:00:00-05'),
            partman_test.time_taptest_table_p20241122 FOR VALUES FROM ('2024-11-22 00:00:00-05') TO ('2024-11-23 00:00:00-05'),
            partman_test.time_taptest_table_p20241123 FOR VALUES FROM ('2024-11-23 00:00:00-05') TO ('2024-11-24 00:00:00-05'),
            partman_test.time_taptest_table_p20241124 FOR VALUES FROM ('2024-11-24 00:00:00-05') TO ('2024-11-25 00:00:00-05'),
            partman_test.time_taptest_table_p20241125 FOR VALUES FROM ('2024-11-25 00:00:00-05') TO ('2024-11-26 00:00:00-05'),
            partman_test.time_taptest_table_p20241126 FOR VALUES FROM ('2024-11-26 00:00:00-05') TO ('2024-11-27 00:00:00-05'),
            partman_test.time_taptest_table_default DEFAULT

Now when we try and insert the data that failed before, it succeeds and we can see it is in the default table.

INSERT INTO partman_test.time_taptest_table (col3) VALUES ('2024-12-25'::date);
INSERT 0 1

SELECT * FROM partman_test.time_taptest_table_default;
  col1  | col2  |          col3
--------+-------+------------------------
 «NULL» | stuff | 2024-12-25 00:00:00-05
(1 row)

Constraints with partition tables

The constraint on a normal partition is as you’d expect it to be, showing the lower and upper bounds.

keith@keith=# \d partman_test.time_taptest_table_p20241124
            Table "partman_test.time_taptest_table_p20241124"
 Column |           Type           | Collation | Nullable |    Default
--------+--------------------------+-----------+----------+---------------
 col1   | integer                  |           |          |
 col2   | text                     |           |          | 'stuff'::text
 col3   | timestamp with time zone |           | not null | now()
Partition of: partman_test.time_taptest_table FOR VALUES FROM ('2024-11-24 00:00:00-05') TO ('2024-11-25 00:00:00-05')

If we look at the default partition, we see that the constraint set up is not so simple.

\d+ partman_test.time_taptest_table_default

Table "partman_test.time_taptest_table_default"
Column |           Type           | Collation | Nullable |    Default    | Storage  | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------------+----------+-------------+--------------+-------------
col1   | integer                  |           |          |               | plain    |             |              |
col2   | text                     |           |          | 'stuff'::text | extended |             |              |
col3   | timestamp with time zone |           | not null | now()         | plain    |             |              |
Partition of: partman_test.time_taptest_table DEFAULT
Partition constraint: (NOT ((col3 IS NOT NULL) AND (((col3 >= '2024-11-18 00:00:00-05'::timestamp with time zone) AND
(col3 < '2024-11-19 00:00:00-05'::timestamp with time zone)) OR ((col3 >= '2024-11-19 00:00:00-05'::timestamp with time zone) AND
(col3 < '2024-11-20 00:00:00-05'::timestamp with time zone)) OR ((col3 >= '2024-11-20 00:00:00-05'::timestamp with time zone) AND
(col3 < '2024-11-21 00:00:00-05'::timestamp with time zone)) OR ((col3 >= '2024-11-21 00:00:00-05'::timestamp with time zone) AND
(col3 < '2024-11-22 00:00:00-05'::timestamp with time zone)) OR ((col3 >= '2024-11-22 00:00:00-05'::timestamp with time zone) AND
(col3 < '2024-11-23 00:00:00-05'::timestamp with time zone)) OR ((col3 >= '2024-11-23 00:00:00-05'::timestamp with time zone) AND
(col3 < '2024-11-24 00:00:00-05'::timestamp with time zone)) OR ((col3 >= '2024-11-24 00:00:00-05'::timestamp with time zone) AND
(col3 < '2024-11-25 00:00:00-05'::timestamp with time zone)) OR ((col3 >= '2024-11-25 00:00:00-05'::timestamp with time zone) AND
(col3 < '2024-11-26 00:00:00-05'::timestamp with time zone)) OR ((col3 >= '2024-11-26 00:00:00-05'::timestamp with time zone) AND
(col3 < '2024-11-27 00:00:00-05'::timestamp with time zone)))))

The constraint of a default partition in PostgreSQL can basically be thought of as an anti-constraint of all the other currently existing partitions. When a new partition is added, that anti-constraint is automatically updated to account for the new partition’s boundaries.

But what happens if we try to add a new partition that matches data in the default?

CREATE TABLE partman_test.time_taptest_table_p20241225 PARTITION OF partman_test.time_taptest_table FOR VALUES FROM ('2024-12-25') TO ('2024-12-26');

ERROR:  updated partition constraint for default partition "time_taptest_table_default" would be violated by some row

We get a constraint violation because there is already data in the default partition that would match the new partition’s boundaries. PostgreSQL cannot allow there to be two possible partition routes for the same values.

Moving default data to a new child table

Because of these constraint violations, we must develop a process to be able to keep our data and get it moved to the proper partition: remove that data from the default partition, add the new child partition, then reinsert the data back via the parent so the data routes to the new partition. Thanks to PostgreSQL’s transactional DDL, this can all be done in a single transaction making it transparent to your users.

Here is an example of moving the data from the default to a new partition in a single transaction.

BEGIN;

CREATE TEMP TABLE clean_default_temp (LIKE partman_test.time_taptest_table_default);

WITH partition_data AS (
    DELETE FROM partman_test.time_taptest_table_default RETURNING *
)
INSERT INTO clean_default_temp (col1, col2, col3) SELECT col1, col2, col3 FROM partition_data;

CREATE TABLE partman_test.time_taptest_table_p20241225 PARTITION OF partman_test.time_taptest_table FOR VALUES FROM ('2024-12-25') TO ('2024-12-26');

WITH partition_data AS (
    DELETE FROM clean_default_temp RETURNING *
)
INSERT INTO partman_test.time_taptest_table (col1, col2, col3) SELECT col1, col2, col3 FROM partition_data;

DROP TABLE clean_default_temp;

COMMIT;

SELECT * FROM partman_test.time_taptest_table_default ;
 col1 | col2 | col3
------+------+------
(0 rows)

SELECT * FROM partman_test.time_taptest_table;
  col1  | col2  |          col3
--------+-------+------------------------
 «NULL» | stuff | 2024-12-25 00:00:00-05
(1 row)

SELECT * FROM partman_test.time_taptest_table_p20241225 ;
  col1  | col2  |          col3
--------+-------+------------------------
 «NULL» | stuff | 2024-12-25 00:00:00-05
(1 row)

Large amounts of data in the default

This above example was rather simple for just a single row. However, if you have a large amount of data in the default, this could cause a noticeable disturbance to your users since these rows that are being moved will be locked until the transaction commits. This can be done in smaller batches, but to stay completely transparent to your users, the smallest transactional batch you could do would be the interval size of the partition set, in this case 1 day. You could do it in smaller transactional batches, but that would have to be done to a permanent table that you’re moving the data to and that data would be inaccessible to your users via their normal means. This is because you cannot add that new child partition until ALL the data that would go into it has been removed from the default. However, this isn’t even the most serious problem with data going into the default.

The way that PostgreSQL is able to tell you that you cannot add that new child partition is because at the time you try and attach one, PostgreSQL does a scan of the entire default partition to see if the new child partition’s boundaries match any data there. Even if you have an index on the default, PostgreSQL is going to have to scan the entire table, and most likely be using a costly sequential scan anyway. This means the lock obtained on the parent table to add a partition is held for the duration of the attach command’s transaction. If you’ve got billions of rows, this could possibly take minutes or even longer. This is why it is critical to keep an eye on any data going into any default partition table and move or remove it as soon as possible.

pg_partman

pg_partman is an open source extension for managing partitioning in PostgreSQL and adds several features to PostgreSQL’s built-in, declarative partitioning including automatically creating child partitions, including a default partition for every partitioned table set.

pg_partman check default

The pg_partman extension has a utility to check the default table for rows with the check_default() function. Let’s say we have 4 rows in our default table. Passing no parameters to this function will do a full count on all default partitions of all partition sets managed by pg_partman and return how many rows it found in each partition set.

SELECT * FROM partman.check_default();
              default_table              | count
-----------------------------------------+-------
 partman_test.time_taptest_table_default |     4

However, if you pass false to this function, it will not do a full count and simply return a 1 if even a single row is encountered in any default partition (using a LIMIT 1 clause). This usage of the function can be used for regular monitoring of your partition sets.

SELECT * FROM partman.check_default(false);
              default_table              | count
-----------------------------------------+-------
 partman_test.time_taptest_table_default |     1

Default row cleanup with pg_partman

The pg_partman procedure partition_data_proc() will automatically clean up your default partition. This procedure does the same steps shown above for native partitioning: moving the data to a temporary table, creating the necessary child tables based on the data found, then moving the data back.

Let’s look at our default partition with 4 rows of data.

select * from partman_test.time_taptest_table_default;
  col1  | col2  |          col3
--------+-------+------------------------
 «NULL» | stuff | 2024-12-25 00:00:00-05
 «NULL» | stuff | 2024-12-26 00:00:00-05
 «NULL» | stuff | 2024-12-27 00:00:00-05
 «NULL» | stuff | 2024-12-28 00:00:00-05

When we call the function partition_data_proc(), it commits after each child partition is created. If you do not give this procedure a source table, it assumes you are moving data out of the default partition for the given partition set.

CALL partman.partition_data_proc('partman_test.time_taptest_table');
NOTICE:  Loop: 1, Rows moved: 1
NOTICE:  Loop: 2, Rows moved: 1
NOTICE:  Loop: 3, Rows moved: 1
NOTICE:  Loop: 4, Rows moved: 1
NOTICE:  Total rows moved: 4
NOTICE:  Ensure to VACUUM ANALYZE the parent (and source table if used) after partitioning data

Notice that the last line of advice is very important to ensure the statistics for your partition set have been updated and old rows cleaned up properly.

VACUUM ANALYZE partman_test.time_taptest_table;

Gaps in child partitions

Now we can see that the new child partitions have been made, the data has been moved to them, and the default partition is empty.

\d+ partman_test.time_taptest_table
                                      Partitioned table "partman_test.time_taptest_table"
 Column |           Type           | Collation | Nullable |    Default    | Storage  | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------------+----------+-------------+--------------+-------------
 col1   | integer                  |           |          |               | plain    |             |              |
 col2   | text                     |           |          | 'stuff'::text | extended |             |              |
 col3   | timestamp with time zone |           | not null | now()         | plain    |             |              |
Partition key: RANGE (col3)
Foreign-key constraints:
    "fk_test" FOREIGN KEY (col2) REFERENCES partman_test.fk_test_reference(col2)
Partitions: partman_test.time_taptest_table_p20241118 FOR VALUES FROM ('2024-11-18 00:00:00-05') TO ('2024-11-19 00:00:00-05'),
            partman_test.time_taptest_table_p20241119 FOR VALUES FROM ('2024-11-19 00:00:00-05') TO ('2024-11-20 00:00:00-05'),
            partman_test.time_taptest_table_p20241120 FOR VALUES FROM ('2024-11-20 00:00:00-05') TO ('2024-11-21 00:00:00-05'),
            partman_test.time_taptest_table_p20241121 FOR VALUES FROM ('2024-11-21 00:00:00-05') TO ('2024-11-22 00:00:00-05'),
            partman_test.time_taptest_table_p20241122 FOR VALUES FROM ('2024-11-22 00:00:00-05') TO ('2024-11-23 00:00:00-05'),
            partman_test.time_taptest_table_p20241123 FOR VALUES FROM ('2024-11-23 00:00:00-05') TO ('2024-11-24 00:00:00-05'),
            partman_test.time_taptest_table_p20241124 FOR VALUES FROM ('2024-11-24 00:00:00-05') TO ('2024-11-25 00:00:00-05'),
            partman_test.time_taptest_table_p20241125 FOR VALUES FROM ('2024-11-25 00:00:00-05') TO ('2024-11-26 00:00:00-05'),
            partman_test.time_taptest_table_p20241126 FOR VALUES FROM ('2024-11-26 00:00:00-05') TO ('2024-11-27 00:00:00-05'),
            partman_test.time_taptest_table_p20241225 FOR VALUES FROM ('2024-12-25 00:00:00-05') TO ('2024-12-26 00:00:00-05'),
            partman_test.time_taptest_table_p20241226 FOR VALUES FROM ('2024-12-26 00:00:00-05') TO ('2024-12-27 00:00:00-05'),
            partman_test.time_taptest_table_p20241227 FOR VALUES FROM ('2024-12-27 00:00:00-05') TO ('2024-12-28 00:00:00-05'),
            partman_test.time_taptest_table_p20241228 FOR VALUES FROM ('2024-12-28 00:00:00-05') TO ('2024-12-29 00:00:00-05'),
            partman_test.time_taptest_table_default DEFAULT

But, you will notice that we now have a gap between Nov 26 and Dec 25th.

pg_partman will only make new partitions based on the NEWEST partition and row data, in this case Dec 28th. It will not automatically fill in gaps to avoid potentially expensive automatic maintenance operations. However in many cases, you will be expecting data for these missing child partitions and will want to fill in the gaps.

pg_partman has a utility that you can run manually to do this: partition_gap_fill.

SELECT * FROM partman.partition_gap_fill('partman_test.time_taptest_table');
 partition_gap_fill
--------------------
                 28

After running this, PostgreSQL returns the number of partitions that were created and, as you can see below, we now have a daily partitioned set fully covered from Nov 18, 2024 to Dec 28, 2024.

\d+ partman_test.time_taptest_table
                                      Partitioned table "partman_test.time_taptest_table"
 Column |           Type           | Collation | Nullable |    Default    | Storage  | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------------+----------+-------------+--------------+-------------
 col1   | integer                  |           |          |               | plain    |             |              |
 col2   | text                     |           |          | 'stuff'::text | extended |             |              |
 col3   | timestamp with time zone |           | not null | now()         | plain    |             |              |
Partition key: RANGE (col3)
Foreign-key constraints:
    "fk_test" FOREIGN KEY (col2) REFERENCES partman_test.fk_test_reference(col2)
Partitions: partman_test.time_taptest_table_p20241118 FOR VALUES FROM ('2024-11-18 00:00:00-05') TO ('2024-11-19 00:00:00-05'),
            partman_test.time_taptest_table_p20241119 FOR VALUES FROM ('2024-11-19 00:00:00-05') TO ('2024-11-20 00:00:00-05'),
            partman_test.time_taptest_table_p20241120 FOR VALUES FROM ('2024-11-20 00:00:00-05') TO ('2024-11-21 00:00:00-05'),
            partman_test.time_taptest_table_p20241121 FOR VALUES FROM ('2024-11-21 00:00:00-05') TO ('2024-11-22 00:00:00-05'),
            partman_test.time_taptest_table_p20241122 FOR VALUES FROM ('2024-11-22 00:00:00-05') TO ('2024-11-23 00:00:00-05'),
            partman_test.time_taptest_table_p20241123 FOR VALUES FROM ('2024-11-23 00:00:00-05') TO ('2024-11-24 00:00:00-05'),
            partman_test.time_taptest_table_p20241124 FOR VALUES FROM ('2024-11-24 00:00:00-05') TO ('2024-11-25 00:00:00-05'),
            partman_test.time_taptest_table_p20241125 FOR VALUES FROM ('2024-11-25 00:00:00-05') TO ('2024-11-26 00:00:00-05'),
            partman_test.time_taptest_table_p20241126 FOR VALUES FROM ('2024-11-26 00:00:00-05') TO ('2024-11-27 00:00:00-05'),
            partman_test.time_taptest_table_p20241127 FOR VALUES FROM ('2024-11-27 00:00:00-05') TO ('2024-11-28 00:00:00-05'),
            partman_test.time_taptest_table_p20241128 FOR VALUES FROM ('2024-11-28 00:00:00-05') TO ('2024-11-29 00:00:00-05'),
            partman_test.time_taptest_table_p20241129 FOR VALUES FROM ('2024-11-29 00:00:00-05') TO ('2024-11-30 00:00:00-05'),
            partman_test.time_taptest_table_p20241130 FOR VALUES FROM ('2024-11-30 00:00:00-05') TO ('2024-12-01 00:00:00-05'),
            partman_test.time_taptest_table_p20241201 FOR VALUES FROM ('2024-12-01 00:00:00-05') TO ('2024-12-02 00:00:00-05'),
            partman_test.time_taptest_table_p20241202 FOR VALUES FROM ('2024-12-02 00:00:00-05') TO ('2024-12-03 00:00:00-05'),
            partman_test.time_taptest_table_p20241203 FOR VALUES FROM ('2024-12-03 00:00:00-05') TO ('2024-12-04 00:00:00-05'),
            partman_test.time_taptest_table_p20241204 FOR VALUES FROM ('2024-12-04 00:00:00-05') TO ('2024-12-05 00:00:00-05'),
            partman_test.time_taptest_table_p20241205 FOR VALUES FROM ('2024-12-05 00:00:00-05') TO ('2024-12-06 00:00:00-05'),
            partman_test.time_taptest_table_p20241206 FOR VALUES FROM ('2024-12-06 00:00:00-05') TO ('2024-12-07 00:00:00-05'),
            partman_test.time_taptest_table_p20241207 FOR VALUES FROM ('2024-12-07 00:00:00-05') TO ('2024-12-08 00:00:00-05'),
            partman_test.time_taptest_table_p20241208 FOR VALUES FROM ('2024-12-08 00:00:00-05') TO ('2024-12-09 00:00:00-05'),
            partman_test.time_taptest_table_p20241209 FOR VALUES FROM ('2024-12-09 00:00:00-05') TO ('2024-12-10 00:00:00-05'),
            partman_test.time_taptest_table_p20241210 FOR VALUES FROM ('2024-12-10 00:00:00-05') TO ('2024-12-11 00:00:00-05'),
            partman_test.time_taptest_table_p20241211 FOR VALUES FROM ('2024-12-11 00:00:00-05') TO ('2024-12-12 00:00:00-05'),
            partman_test.time_taptest_table_p20241212 FOR VALUES FROM ('2024-12-12 00:00:00-05') TO ('2024-12-13 00:00:00-05'),
            partman_test.time_taptest_table_p20241213 FOR VALUES FROM ('2024-12-13 00:00:00-05') TO ('2024-12-14 00:00:00-05'),
            partman_test.time_taptest_table_p20241214 FOR VALUES FROM ('2024-12-14 00:00:00-05') TO ('2024-12-15 00:00:00-05'),
            partman_test.time_taptest_table_p20241215 FOR VALUES FROM ('2024-12-15 00:00:00-05') TO ('2024-12-16 00:00:00-05'),
            partman_test.time_taptest_table_p20241216 FOR VALUES FROM ('2024-12-16 00:00:00-05') TO ('2024-12-17 00:00:00-05'),
            partman_test.time_taptest_table_p20241217 FOR VALUES FROM ('2024-12-17 00:00:00-05') TO ('2024-12-18 00:00:00-05'),
            partman_test.time_taptest_table_p20241218 FOR VALUES FROM ('2024-12-18 00:00:00-05') TO ('2024-12-19 00:00:00-05'),
            partman_test.time_taptest_table_p20241219 FOR VALUES FROM ('2024-12-19 00:00:00-05') TO ('2024-12-20 00:00:00-05'),
            partman_test.time_taptest_table_p20241220 FOR VALUES FROM ('2024-12-20 00:00:00-05') TO ('2024-12-21 00:00:00-05'),
            partman_test.time_taptest_table_p20241221 FOR VALUES FROM ('2024-12-21 00:00:00-05') TO ('2024-12-22 00:00:00-05'),
            partman_test.time_taptest_table_p20241222 FOR VALUES FROM ('2024-12-22 00:00:00-05') TO ('2024-12-23 00:00:00-05'),
            partman_test.time_taptest_table_p20241223 FOR VALUES FROM ('2024-12-23 00:00:00-05') TO ('2024-12-24 00:00:00-05'),
            partman_test.time_taptest_table_p20241224 FOR VALUES FROM ('2024-12-24 00:00:00-05') TO ('2024-12-25 00:00:00-05'),
            partman_test.time_taptest_table_p20241225 FOR VALUES FROM ('2024-12-25 00:00:00-05') TO ('2024-12-26 00:00:00-05'),
            partman_test.time_taptest_table_p20241226 FOR VALUES FROM ('2024-12-26 00:00:00-05') TO ('2024-12-27 00:00:00-05'),
            partman_test.time_taptest_table_p20241227 FOR VALUES FROM ('2024-12-27 00:00:00-05') TO ('2024-12-28 00:00:00-05'),
            partman_test.time_taptest_table_p20241228 FOR VALUES FROM ('2024-12-28 00:00:00-05') TO ('2024-12-29 00:00:00-05'),
            partman_test.time_taptest_table_default DEFAULT

Summary

PostgreSQL does not make any child partitions automatically, including a default partition. If you’re using partitioning, it is recommended to have a default partition to catch mistakes in application code or in child partition creation. However it is very important to monitor the contents of those default partitions.
pg_partman manages automatically creating child partitions for you, including the default partition if desired. The check_default function can help you monitor the contents of default partitions.
If rows are found in the default, it is important to ensure these are reviewed ASAP. If the rows are invalid, they can simply be deleted. If they are important, devise a process to move them to the proper child partitions. pg_partman’s partition_data_proc can assist with this.

The default partition is an incredibly useful tool to ensure you do not lose important data that may not be covered by existing child partitions. If you see data frequently going into the default partition, I highly advise reviewing your partition maintenance to ensure it is keeping up with the window of data that is regularly being ingested into that partition set. If the necessary child partitions always exist, you will have the best performance with the least maintenance.

Time Partitioning and Custom Time Intervals in Postgres with pg_partman

Keith.Fiske@crunchydata.com (Keith Fiske) — Thu, 09 May 2024 13:00:00 EDT

Whether you are managing a large table or setting up automatic archiving, time based partitioning in Postgres is incredibly powerful. pg_partman’s newest versions support a huge variety of custom time internals. Marco just published a post on using pg_partman with our new database product for doing analytics with Postgres, Crunchy Data Warehouse. So I thought this would be a great time to review the basic and complex options for the time based partitioning.

Time partitioning intervals

When I first started designing pg_partman for time-based partitioning, it only had preset intervals that users could choose. Currently, pg_partman supports all of Postgres’ time base interval values. The partitioning interval is set during the initial parent creation, in the p_interval field.

SELECT partman.create_parent(
    p_parent_table := 'partman_test.time_taptest_table'
    , p_control := 'col3'
    , p_interval := '1 day'
    , p_template_table := 'partman_test.time_taptest_table_template'
);

Additional examples like:

p_interval := '1 month'
p_interval := '1 year'

Even with allowing these intervals, some common intervals used in business, like weekly and quarterly, can be a little tricky. But thankfully pg_partman still has options to make these intervals easy and now even more flexible. So let’s dig into these examples.

Weekly partitioning

Weekly partitioning was and still is a fairly popular partitioning interval. When I started working on it, I’d thankfully found the ISO week date standard to allow me to tackle the more difficult issues of handling weeks (leap years, starting days, 53 week years) when I’d wanted to label the children with the week number . However with declarative partitioning I found an opportunity to allow this to be more flexible when redesigning things for version 5 of partman. While the result did get rid of the nice weekly numbering pattern I had liked for this interval (IYYYwIW which came out to something like “2024w15”), the new method lets people start their week on whichever day they desired. However, with flexibility always comes a little more complexity.

When you set your partitioning interval to 1 week in pg_partman, the day that starts that weekly pattern will be whatever day of the week it is when you run create_parent(). So today being a Wednesday when I’m writing this blog post, my partition naming pattern AND constraints for the child tables would be as follows:

CREATE TABLE time_stuff(id int GENERATED ALWAYS AS IDENTITY, created_at timestamptz NOT NULL) PARTITION BY RANGE (created_at);

SELECT partman.create_parent('public.time_stuff', 'created_at', '1 week');
 create_parent
---------------
 t
(1 row)

\d+ time_stuff
                                                      Partitioned table "public.time_stuff"
   Column   |           Type           | Collation | Nullable |           Default            | Storage | Compression | Stats target | Description
------------+--------------------------+-----------+----------+------------------------------+---------+-------------+--------------+-------------
 id         | integer                  |           | not null | generated always as identity | plain   |             |              |
 created_at | timestamp with time zone |           | not null |                              | plain   |             |              |
Partition key: RANGE (created_at)
Partitions: time_stuff_p20240327 FOR VALUES FROM ('2024-03-27 00:00:00-04') TO ('2024-04-03 00:00:00-04'),
            time_stuff_p20240403 FOR VALUES FROM ('2024-04-03 00:00:00-04') TO ('2024-04-10 00:00:00-04'),
            time_stuff_p20240410 FOR VALUES FROM ('2024-04-10 00:00:00-04') TO ('2024-04-17 00:00:00-04'),
            time_stuff_p20240417 FOR VALUES FROM ('2024-04-17 00:00:00-04') TO ('2024-04-24 00:00:00-04'),
            time_stuff_p20240424 FOR VALUES FROM ('2024-04-24 00:00:00-04') TO ('2024-05-01 00:00:00-04'),
            time_stuff_p20240501 FOR VALUES FROM ('2024-05-01 00:00:00-04') TO ('2024-05-08 00:00:00-04'),
            time_stuff_p20240508 FOR VALUES FROM ('2024-05-08 00:00:00-04') TO ('2024-05-15 00:00:00-04'),
            time_stuff_p20240515 FOR VALUES FROM ('2024-05-15 00:00:00-04') TO ('2024-05-22 00:00:00-04'),
            time_stuff_p20240522 FOR VALUES FROM ('2024-05-22 00:00:00-04') TO ('2024-05-29 00:00:00-04'),
            time_stuff_default DEFAULT

I ran these statements on Wednesday, April 24, 2024 so you can see the partition time_stuff_p20240424 with the minimum value of that same day. And each subsequent child table is 7 days later, starting on every Wednesday. So while we’ve accomplished our weekly partitioning goal, this is not a common day to start the week. There is thankfully a very easy solution with pg_partman: we tell it the date to start making partitions. Say we wanted our weeks to start on Sunday. Just pick any Sunday date that would work for child tables we’d like to have initially created

SELECT partman.create_parent('public.time_stuff', 'created_at', '1 week', p_start_partition => '2024-04-17');
 create_parent
---------------
 t
(1 row)

\d+ time_stuff
                                                      Partitioned table "public.time_stuff"
   Column   |           Type           | Collation | Nullable |           Default            | Storage | Compression | Stats target | Description
------------+--------------------------+-----------+----------+------------------------------+---------+-------------+--------------+-------------
 id         | integer                  |           | not null | generated always as identity | plain   |             |              |
 created_at | timestamp with time zone |           | not null |                              | plain   |             |              |
Partition key: RANGE (created_at)
Partitions: time_stuff_p20240417 FOR VALUES FROM ('2024-04-17 00:00:00-04') TO ('2024-04-24 00:00:00-04'),
            time_stuff_p20240424 FOR VALUES FROM ('2024-04-24 00:00:00-04') TO ('2024-05-01 00:00:00-04'),
            time_stuff_p20240501 FOR VALUES FROM ('2024-05-01 00:00:00-04') TO ('2024-05-08 00:00:00-04'),
            time_stuff_p20240508 FOR VALUES FROM ('2024-05-08 00:00:00-04') TO ('2024-05-15 00:00:00-04'),
            time_stuff_p20240515 FOR VALUES FROM ('2024-05-15 00:00:00-04') TO ('2024-05-22 00:00:00-04'),
            time_stuff_p20240522 FOR VALUES FROM ('2024-05-22 00:00:00-04') TO ('2024-05-29 00:00:00-04'),
            time_stuff_default DEFAULT

We don’t have the partitions prior to our starting date created, but you can just pick an even earlier Sunday if you need more older partitions to start with. You can see that April 17, 2024 is a Sunday and every subsequent child table has its lower boundary on a Sunday as well. So using this method you can start on any day of the week you desire. And simply using the day of the lower boundary for the suffix name got rid of the complexities of trying to use week numbers that previously required ISO weeks to solve.

Quarterly partitioning

I’d always liked the idea of quarterly partitioning since it seemed to be a nice balance between larger and smaller partitioning intervals. PostgreSQL does have some limited quarterly timestamp formatting options, but if you go back and look at the partman source code for older versions, you’ll see it was way more complex than I’d expected it to be. And it pretty much locked the quarters into 4 pre-defined month blocks. With version 5.x of pg_partman, I decided to do the same as I did with weekly and simply allow any arbitrary 3 month interval people may want. So while it lost the nicer quarterly suffix pattern (YYYYq#, 2024q2), it’s now much more flexible.

The problem and solution for quarterly is the same as weekly. It’s not quite as bad of a problem in that the child lower boundaries are always rounded to the first of the month, but the quarter will default to start in the month that create_parent() runs. So running in April 2024 results in:

keith=# SELECT partman.create_parent('public.time_stuff', 'created_at', '3 months');
 create_parent
---------------
 t
(1 row)

keith=# \d+ time_stuff
                                                      Partitioned table "public.time_stuff"
   Column   |           Type           | Collation | Nullable |           Default            | Storage | Compression | Stats target | Description
------------+--------------------------+-----------+----------+------------------------------+---------+-------------+--------------+-------------
 id         | integer                  |           | not null | generated always as identity | plain   |             |              |
 created_at | timestamp with time zone |           | not null |                              | plain   |             |              |
Partition key: RANGE (created_at)
Partitions: time_stuff_p20230401 FOR VALUES FROM ('2023-04-01 00:00:00-04') TO ('2023-07-01 00:00:00-04'),
            time_stuff_p20230701 FOR VALUES FROM ('2023-07-01 00:00:00-04') TO ('2023-10-01 00:00:00-04'),
            time_stuff_p20231001 FOR VALUES FROM ('2023-10-01 00:00:00-04') TO ('2024-01-01 00:00:00-05'),
            time_stuff_p20240101 FOR VALUES FROM ('2024-01-01 00:00:00-05') TO ('2024-04-01 00:00:00-04'),
            time_stuff_p20240401 FOR VALUES FROM ('2024-04-01 00:00:00-04') TO ('2024-07-01 00:00:00-04'),
            time_stuff_p20240701 FOR VALUES FROM ('2024-07-01 00:00:00-04') TO ('2024-10-01 00:00:00-04'),
            time_stuff_p20241001 FOR VALUES FROM ('2024-10-01 00:00:00-04') TO ('2025-01-01 00:00:00-05'),
            time_stuff_p20250101 FOR VALUES FROM ('2025-01-01 00:00:00-05') TO ('2025-04-01 00:00:00-04'),
            time_stuff_p20250401 FOR VALUES FROM ('2025-04-01 00:00:00-04') TO ('2025-07-01 00:00:00-04'),
            time_stuff_default DEFAULT

This does start on the quarterly months people typically expect but only by coincidence. To start your quarters in whichever month you’d like, simply set the starting partition as we did with weekly. The day doesn’t really matter, just the month.

SELECT partman.create_parent('public.time_stuff', 'created_at', '3 months', p_start_partition => '2024-03-15');
 create_parent
---------------
 t
(1 row)

\d+ time_stuff
                                                      Partitioned table "public.time_stuff"
   Column   |           Type           | Collation | Nullable |           Default            | Storage | Compression | Stats target | Description
------------+--------------------------+-----------+----------+------------------------------+---------+-------------+--------------+-------------
 id         | integer                  |           | not null | generated always as identity | plain   |             |              |
 created_at | timestamp with time zone |           | not null |                              | plain   |             |              |
Partition key: RANGE (created_at)
Partitions: time_stuff_p20240301 FOR VALUES FROM ('2024-03-01 00:00:00-05') TO ('2024-06-01 00:00:00-04'),
            time_stuff_p20240601 FOR VALUES FROM ('2024-06-01 00:00:00-04') TO ('2024-09-01 00:00:00-04'),
            time_stuff_p20240901 FOR VALUES FROM ('2024-09-01 00:00:00-04') TO ('2024-12-01 00:00:00-05'),
            time_stuff_p20241201 FOR VALUES FROM ('2024-12-01 00:00:00-05') TO ('2025-03-01 00:00:00-05'),
            time_stuff_p20250301 FOR VALUES FROM ('2025-03-01 00:00:00-05') TO ('2025-06-01 00:00:00-04'),
            time_stuff_default DEFAULT

Any Arbitrary Interval

While solving for these two custom intervals isn’t too complicated, I did run into issues with allowing any arbitrary custom interval before 5.0. The issue is with how partman rounds the intervals to give the normally expected lower boundaries depending on the length of the interval: daily rounds to midnight, monthly rounds to the first of the month, etc. What if we wanted to partition by 9 week intervals and we wanted it to start on Mondays beginning with April 22, 2024?

keith=# SELECT partman.create_parent('public.time_stuff', 'created_at', '9 weeks', p_start_partition => '2024-04-22');
 create_parent
---------------
 t
(1 row)

keith=# \d+ time_stuff
                                                      Partitioned table "public.time_stuff"
   Column   |           Type           | Collation | Nullable |           Default            | Storage | Compression | Stats target | Description
------------+--------------------------+-----------+----------+------------------------------+---------+-------------+--------------+-------------
 id         | integer                  |           | not null | generated always as identity | plain   |             |              |
 created_at | timestamp with time zone |           | not null |                              | plain   |             |              |
Partition key: RANGE (created_at)
Partitions: time_stuff_p20240401 FOR VALUES FROM ('2024-04-01 00:00:00-04') TO ('2024-06-03 00:00:00-04'),
            time_stuff_p20240603 FOR VALUES FROM ('2024-06-03 00:00:00-04') TO ('2024-08-05 00:00:00-04'),
            time_stuff_p20240805 FOR VALUES FROM ('2024-08-05 00:00:00-04') TO ('2024-10-07 00:00:00-04'),
            time_stuff_p20241007 FOR VALUES FROM ('2024-10-07 00:00:00-04') TO ('2024-12-09 00:00:00-05'),
            time_stuff_p20241209 FOR VALUES FROM ('2024-12-09 00:00:00-05') TO ('2025-02-10 00:00:00-05'),
            time_stuff_default DEFAULT

That doesn’t look right! The issue here is that since the interval is less than 1 year but greater than or equal to 1 month, partman always tries to round to the first day of the month. What we want partman to do is round to the nearest week instead since our interval is based on an arbitrary amount of weeks. As I said, this was an issue before 5.0 but fixed only fairly recently thanks to a bug report from a user. This was solved in 4.6.0 by adding another option to create_parent().

keith=# SELECT partman.create_parent('public.time_stuff', 'created_at', '9 weeks', p_start_partition => '2024-04-22', p_date_trunc_interval => 'week');
 create_parent
---------------
 t
(1 row)

keith=# \d+ time_stuff
                                                      Partitioned table "public.time_stuff"
   Column   |           Type           | Collation | Nullable |           Default            | Storage | Compression | Stats target | Description
------------+--------------------------+-----------+----------+------------------------------+---------+-------------+--------------+-------------
 id         | integer                  |           | not null | generated always as identity | plain   |             |              |
 created_at | timestamp with time zone |           | not null |                              | plain   |             |              |
Partition key: RANGE (created_at)
Partitions: time_stuff_p20240422 FOR VALUES FROM ('2024-04-22 00:00:00-04') TO ('2024-06-24 00:00:00-04'),
            time_stuff_p20240624 FOR VALUES FROM ('2024-06-24 00:00:00-04') TO ('2024-08-26 00:00:00-04'),
            time_stuff_p20240826 FOR VALUES FROM ('2024-08-26 00:00:00-04') TO ('2024-10-28 00:00:00-04'),
            time_stuff_p20241028 FOR VALUES FROM ('2024-10-28 00:00:00-04') TO ('2024-12-30 00:00:00-05'),
            time_stuff_p20241230 FOR VALUES FROM ('2024-12-30 00:00:00-05') TO ('2025-03-03 00:00:00-05'),
            time_stuff_default DEFAULT

The p_date_trunc_interval parameter takes values that are valid for the PostgreSQL built-in function date_trunc. This tells partman how to round the boundaries to get the values you’re more likely expecting. One unfortunate thing that is unique for the weekly option here is that date_trunc('week', <timetamptz>) always rounds to a Monday. So in this case you wouldn’t be able to have an arbitrary amount of weeks that start on a Sunday or any other day of the week.

If you cannot use more common partition intervals (daily, monthly, etc), you’ll likely have to experiment with this feature to see if it allows you do do what you need. I would personally recommend trying to stick with more common intervals if at all possible, but business requirements sometimes require the uncommon.

Conclusion

pg_partman provides comprehensive support of time based intervals to serve a wide variety of needs. Hopefully this blog post has helped to show both basic and advanced features and how to handle some more complex partitioning requirements.

Syncing Postgres Partitions to Your Data Lake in Crunchy Bridge for Analytics

Marco.Slot@crunchydata.com (Marco Slot) — Tue, 07 May 2024 09:00:00 EDT

Looking for a Postgres data lake? Crunchy Data Warehouse is Postgres native database with Parquet and Iceberg as first class citizens.

One of the unique characteristics of the recently launched Crunchy Bridge for Analytics is that it is effectively a hybrid between a transactional and an analytical database system. That is a powerful tool when dealing with data-intensive applications which may for example require a combination of low latency, high throughput insertion, efficient lookup of recent data, and fast interactive analytics over historical data.

A common source of large data volumes is append-mostly time series data or event data generated by an application. PostgreSQL has various tools to optimize your database for time series, such as partitioning, BRIN indexes, time functions, and its native heap storage format is well-suited for bulk writes. However, there is a limit to what PostgreSQL can do with large data volumes, especially in terms of performance of analytical queries on large data sets, and the operational overhead of storing a large amount of historical data in your database.

Bridge for Analytics solves this problem in 2 ways. In addition to everything PostgreSQL can already do:

You can easily copy (or transform & copy) data into your data lake for cheaper storage and access by other applications
You can create tables to efficiently run analytical queries on your data lake through a vectorized query engine and caching on Nvme drives

This blog post describes an end-to-end solution for storing recent event data in PostgreSQL using time-partitioning, and then copying those time partitions into your data lake, and running fast analytical queries, all on the same Bridge for Analytics instance.

Setting up a time-partitioned table for fast writes and lookup queries

When dealing with a large stream of events, it is almost always a good idea to create a partitioned tables using pg_partman and use pg_cron for periodic maintenance. Partitioning by time helps to efficiently drop old data, and typically improves the performance of both queries and inserts, by keeping indexes small, and avoiding bloat (gaps left by deleted rows) and fragmentation (rows that are frequently retrieved together getting scattered through space reuse).

On Crunchy Bridge, you can connect to your Bridge for Analytics cluster as the postgres superuser and set up pg_partman and pg_cron:

-- Run the following as superuser
CREATE SCHEMA IF NOT EXISTS partman;
CREATE EXTENSION IF NOT EXISTS pg_partman WITH SCHEMA partman;
GRANT ALL ON SCHEMA partman TO application;
GRANT ALL ON ALL TABLES IN SCHEMA partman TO application;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA partman TO application;
GRANT EXECUTE ON ALL PROCEDURES IN SCHEMA partman TO application;

CREATE EXTENSION IF NOT EXISTS pg_cron;
GRANT USAGE ON SCHEMA cron TO application;

Now we'll create a table to capture requests to a website or API. We might want to load such data in batches from recent logs, or do an insert for individual requests from our application. We keep up to 7 days of data to power a dashboard with recent user activity, and to help in debugging issues.

-- Create a time-partitioned table
DROP TABLE IF EXISTS requests;
CREATE TABLE requests (
    event_time timestamptz not null default now(),
    event_id bigint generated always as identity,
    request_type text not null,
    url text,
    response_code int,
    response_time double precision,
    tenant_id bigint,
    user_tag text,
    session_tag text
)
PARTITION BY RANGE (event_time);
CREATE INDEX ON requests USING brin (event_time);
CREATE INDEX ON requests (tenant_id, user_tag);

-- Set up automatic partitioning by day
SELECT partman.create_parent(
  p_parent_table := 'public.requests',
  p_control := 'event_time',
  p_interval := '1 day',
  p_start_partition := '2024-01-01',
  p_default_table := false
);

-- Drop tables older than a week when running pg_partman maintenance
UPDATE partman.part_config SET retention_keep_table = false, retention = '1 week'
WHERE parent_table = 'public.requests';

-- Regularly run pg_partman maintenance
SELECT cron.schedule('partman-maintenance', '@daily',
                     $$CALL partman.run_maintenance_proc()$$);

This set up will automatically create new partitions for incoming data as time progresses and drop old partitions after a week. In the meantime, we can do fast lookups of rows on indexed columns and time-range filters.

Copying data into a lake analytics table for cheap storage and fast analytics

You may want to keep your data longer than 7 days on cheaper storage, and also perform fast, interactive analytics. For this we'll create a crunchy_lake_analytics table, which is backed by compressed Parquet files in S3. Parquet is a file format that is optimized for analytics.

-- Create an analytics table
CREATE FOREIGN TABLE requests_analytics (
    event_time timestamptz not null,
    event_id bigint not null,
    request_type text not null,
    url text,
    response_code int,
    response_time double precision,
    tenant_id bigint,
    user_tag text,
    session_tag text
)
SERVER crunchy_lake_analytics
OPTIONS (path 's3://mybucket/requests/*.parquet');

When we talk about partitioning you might be thinking of a model where old partitions rotate into S3 (”tiering”), so the overall partitioned table becomes a mixture of recent data in heap partitions and older data in Parquet partitions. However, we do not recommend such a model for analytics tables, because it has several downsides:

Each storage format targets a different type of query, and running both types of queries on the same mixed table can give the worst of both worlds.
Fast analytics on columnar data is only available for very old data, while you might want it for relatively recent data too.
You may want to apply some transformation before writing to cold storage, meaning older data follows a different schema.
Updates / late inserts on older partitions become hard to handle if the data is converted to Parquet.
Aggregates cannot always be performed efficiently on partitioned tables.

Hence, we prefer to follow an alternative model where a crunchy_lake_analytics table exists alongside the source table, and we copy data over as soon as we stop receiving new data for a partition, but may still keep it in the time-partitioned heap table as needed. The data is effectively still time-partitioned, and it has all the same benefits, but the partitioning is hidden from PostgreSQL.

This approach creates some redundancy where recent data is stored a second time in compressed form in S3. That's a small price to pay for getting a lot more flexibility and analytics performance. For instance, if at some point we get inserts from a few days ago or need to perform an update, then we can simply modify the heap partition and re-copy the data.

Copying data into your analytics table

We need a way to copy data from our time-partitioned heap table into our analytics table, for which we can use the COPY command. Let's define a function that copies a partition into a Parquet file in S3, using the default snappy compression.

-- Create a function to transform a partition and move it into the analytics table
CREATE FUNCTION public.copy_partition_to_data_lake(partition_name_p regclass,
                                                   url_p text)
RETURNS void LANGUAGE plpgsql SET search_path TO '' AS $function$
BEGIN
    EXECUTE format($$
      COPY (SELECT * FROM %s) TO %L WITH (format 'parquet')
    $$, partition_name_p, url_p);
END;
$function$;

It may be helpful to do a small bit of bookkeeping for the data we copied. That way we can easily make sure to not skip any partitions in case of failure, or skip copying the data if copying has already succeeded.

-- Create a table for tracking which partitions have been synced to S3
CREATE TABLE synced_partitions (
  partition_name regclass primary key
);

-- Create a function that ensures a partition is synced to S3
CREATE FUNCTION public.sync_partition(partition_name_p regclass)
RETURNS void LANGUAGE plpgsql SET search_path TO '' AS $function$
DECLARE
    /* construct the URL for a given partition */
    url text := format('s3://mybucket/requests/%1$s.parquet', partition_name_p);
BEGIN
    /* skip if we already synced to S3 */
    IF NOT EXISTS (SELECT 1 FROM public.synced_partitions WHERE partition_name = partition_name_p) THEN
        /* remember that we synced */
        INSERT INTO public.synced_partitions VALUES (partition_name_p);

        /* copy data to S3 */
        PERFORM public.copy_partition_to_data_lake(partition_name_p, url);

        /* invalidate the nvme cache */
        PERFORM crunchy_file_cache.remove(url);
    END IF;
END;
$function$;

-- Ensure all partitions from before today are periodically copied
-- into the analytics table
SELECT cron.schedule('sync-partitions', '@hourly', $$
  SELECT
    public.sync_partition(format('%I.%I', partition_schemaname, partition_tablename))
  FROM
    partman.show_partitions('public.requests'),
    partman.show_partition_info(format('%I.%I', partition_schemaname, partition_tablename))
  WHERE
    child_end_time::timestamptz <= current_date;
$$);

This gives a fully automated system for reliably copying incoming data into our data lake.

Running fast analytical queries on our historical data

To try our set up, let's insert some synthetic data (300M rows):

INSERT INTO
  requests (event_time, request_type, response_time, response_code, url, tenant_id, user_tag, session_tag)
SELECT
  current_date - interval '5 days' * random(), 'get', random(), 0, 'https://app.com/'||md5(random()::text), s % 100, md5((s % 1000000)::text), md5(random()::text)
FROM
  generate_series(1,30000000) s;

We could wait for the hourly sync, but could also sync immediately. We use the show_partition functions in pg_partman to find partitions that have data from before today.

SELECT
  sync_partition(format('%I.%I', partition_schemaname, partition_tablename))
FROM
  partman.show_partitions('public.requests'),
  partman.show_partition_info(format('%I.%I', partition_schemaname, partition_tablename))
WHERE
  child_end_time::timestamptz <= current_date;

Now we are ready to run analytical queries and can compare running them on the partitioned heap table vs. the analytics table:

-- Simple analytical query on time-partitioned heap table (slow)
SELECT url, count(*) FROM requests GROUP BY 1 ORDER BY 2 DESC LIMIT 10;
...
Time: 22525.851 ms

-- Simple analytical query on analytics table reading from data lake (fast)
SELECT url, count(*) FROM requests_analytics GROUP BY 1 ORDER BY 2 DESC LIMIT 10;
...
Time: 2301.106 ms

-- A bit later: data files are cached on nvme drive (really fast)
SELECT url, count(*) FROM requests_analytics GROUP BY 1 ORDER BY 2 DESC LIMIT 10;
...
Time: 494.507 ms

-- Lookup query on time-partitioned heap table (fast)
SELECT * FROM requests WHERE tenant_id = 4 AND user_tag = 'a87ff679a2f3e71d9181a67b7542122c' ORDER BY event_time DESC LIMIT 10;
...
Time: 20.084 ms

-- Lookup query on analytics table (slow-ish)
SELECT * FROM requests_analytics WHERE tenant_id = 4 AND user_tag = 'a87ff679a2f3e71d9181a67b7542122c' ORDER BY event_time DESC LIMIT 10;
...
Time: 121.213 ms

As you can see, both time-partitioned heap tables and analytics tables have merits. The heap tables can have fast insertion and fast lookups using indexes, while analytical tables are 1-2 orders of magnitude faster for analytical queries, which prepares you for dealing with very large amounts of historical data.

In this case, we made sure that the two tables had identical contents to compare performance. In practice, you might also want to normalize, filter, or scrub your data when copying into the historical table, which can give you some additional speed ups.

Get started with Crunchy Bridge for Analytics

Bridge for Analytics helps you query your existing data in your data lake from PostgreSQL, but also supports hybrid workloads where you combine heap tables and analytics tables to handle more advanced, data-intensive workloads.

You can get started with Bridge for Analytics with a few clicks once you’re signed up to Crunchy Bridge. You’ll get a fully managed PostgreSQL cluster with the additional extensions and other components that power analytics. Then, head over to the Analytics docs to find out more.

Auto-archiving and Data Retention Management in Postgres with pg_partman

Keith.Fiske@crunchydata.com (Keith Fiske) — Fri, 19 Apr 2024 09:00:00 EDT

You could be saving money every month on databases costs with a smarter data retention policy. One of the primary reasons, and a huge benefit of partitioning is using it to automatically archive your data. For example, you might have a huge log table. For business purposes, you need to keep this data for 30 days. This table grows continually over time and keeping all the data makes database maintenance challenging. With time-based partitioning, you can simply archive off data older than 30 days.

The nature of most relational databases means that deleting large volumes of data can be very inefficient and that space is not immediately, if ever, returned to the file system. PostgreSQL does not return the space it reserves to the file system when normal deletion operations are run except under very specific conditions:

the page(s) at the end of the relation are completely emptied
a VACUUM FULL/CLUSTER is run against the relation (exclusively locking it until complete)

If you find yourself needing that space back more immediately, or without intrusive locking, then partitioning can provide a much simpler means of removing old data: drop the table. The removal is nearly instantaneous (barring any transactions locking the table) and immediately returns the space to the file system. pg_partman, the Postgres extension for partitioning, provides a very easy way to manage this for time and integer based partitioning.

pg_partman daily partition example

Recently pg_partman 5.1 was released that includes new features such as list partitioning for single value integers, controlled maintenance run ordering, and experimental support for numeric partitioning. This new version also includes several bug fixes, so please update to the latest release when possible! All examples were done using this latest version.

https://github.com/pgpartman/pg_partman

First lets get a simple, time-based daily partition set going

CREATE TABLE public.time_stuff
    (col1 int
        , col2 text default 'stuff'
        , col3 timestamptz NOT NULL DEFAULT now() )
    PARTITION BY RANGE (col3);

SELECT partman.create_parent('public.time_stuff', 'col3', '1 day');

\d+ time_stuff
                                             Partitioned table "public.time_stuff"
 Column |           Type           | Collation | Nullable |    Default    | Storage  | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------------+----------+-------------+--------------+-------------
 col1   | integer                  |           |          |               | plain    |             |              |
 col2   | text                     |           |          | 'stuff'::text | extended |             |              |
 col3   | timestamp with time zone |           | not null | now()         | plain    |             |              |
Partition key: RANGE (col3)
Partitions: time_stuff_p20240408 FOR VALUES FROM ('2024-04-08 00:00:00-04') TO ('2024-04-09 00:00:00-04'),
            time_stuff_p20240409 FOR VALUES FROM ('2024-04-09 00:00:00-04') TO ('2024-04-10 00:00:00-04'),
            time_stuff_p20240410 FOR VALUES FROM ('2024-04-10 00:00:00-04') TO ('2024-04-11 00:00:00-04'),
            time_stuff_p20240411 FOR VALUES FROM ('2024-04-11 00:00:00-04') TO ('2024-04-12 00:00:00-04'),
            time_stuff_p20240412 FOR VALUES FROM ('2024-04-12 00:00:00-04') TO ('2024-04-13 00:00:00-04'),
            time_stuff_p20240413 FOR VALUES FROM ('2024-04-13 00:00:00-04') TO ('2024-04-14 00:00:00-04'),
            time_stuff_p20240414 FOR VALUES FROM ('2024-04-14 00:00:00-04') TO ('2024-04-15 00:00:00-04'),
            time_stuff_p20240415 FOR VALUES FROM ('2024-04-15 00:00:00-04') TO ('2024-04-16 00:00:00-04'),
            time_stuff_p20240416 FOR VALUES FROM ('2024-04-16 00:00:00-04') TO ('2024-04-17 00:00:00-04'),
            time_stuff_default DEFAULT

Setting data retention policies

This partition set was created on April 12, 2024, so a default setup will create 4 partitions before and 4 partitions after. The first setting to configure for retention, and the only one that is required, is the retention column in the part_config table. For this example, we’ll set a retention of 2 days. We’re also going to increase the premake value just to see that normal maintenance is working as well.

UPDATE partman.part_config SET retention = '2 days', premake = 6 WHERE parent_table = 'public.time_stuff';

By default, pg_partman also does not create new child tables if there is no data in the partition set, so lets add some data in as well.

INSERT INTO public.time_stuff (col1, col3)
VALUES (generate_series(1,10), CURRENT_TIMESTAMP);

SELECT * FROM partman.part_config
WHERE parent_table = 'public.time_stuff';

-[ RECORD 1 ]--------------+-----------------------------------
parent_table               | public.time_stuff
control                    | col3
partition_interval         | 1 day
partition_type             | range
premake                    | 6
automatic_maintenance      | on
template_table             | partman.template_public_time_stuff
retention                  | 2 days
retention_schema           |
retention_keep_index       | t
retention_keep_table       | t
epoch                      | none
constraint_cols            |
optimize_constraint        | 30
infinite_time_partitions   | f
datetime_string            | YYYYMMDD
jobmon                     | t
sub_partition_set_full     | f
undo_in_progress           | f
inherit_privileges         | f
constraint_valid           | t
ignore_default_data        | t
default_table              | t
date_trunc_interval        |
maintenance_order          |
retention_keep_publication | f
maintenance_last_run       |

In pg_partman, retention management is handled at the same time as new partition creation. So a simple call to run_maintenance_proc() will handle both.

CALL partman.run_maintenance_proc();

\d+ time_stuff
                                             Partitioned table "public.time_stuff"
 Column |           Type           | Collation | Nullable |    Default    | Storage  | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------------+----------+-------------+--------------+-------------
 col1   | integer                  |           |          |               | plain    |             |              |
 col2   | text                     |           |          | 'stuff'::text | extended |             |              |
 col3   | timestamp with time zone |           | not null | now()         | plain    |             |              |
Partition key: RANGE (col3)
Partitions: time_stuff_p20240410 FOR VALUES FROM ('2024-04-10 00:00:00-04') TO ('2024-04-11 00:00:00-04'),
            time_stuff_p20240411 FOR VALUES FROM ('2024-04-11 00:00:00-04') TO ('2024-04-12 00:00:00-04'),
            time_stuff_p20240412 FOR VALUES FROM ('2024-04-12 00:00:00-04') TO ('2024-04-13 00:00:00-04'),
            time_stuff_p20240413 FOR VALUES FROM ('2024-04-13 00:00:00-04') TO ('2024-04-14 00:00:00-04'),
            time_stuff_p20240414 FOR VALUES FROM ('2024-04-14 00:00:00-04') TO ('2024-04-15 00:00:00-04'),
            time_stuff_p20240415 FOR VALUES FROM ('2024-04-15 00:00:00-04') TO ('2024-04-16 00:00:00-04'),
            time_stuff_p20240416 FOR VALUES FROM ('2024-04-16 00:00:00-04') TO ('2024-04-17 00:00:00-04'),
            time_stuff_p20240417 FOR VALUES FROM ('2024-04-17 00:00:00-04') TO ('2024-04-18 00:00:00-04'),
            time_stuff_p20240418 FOR VALUES FROM ('2024-04-18 00:00:00-04') TO ('2024-04-19 00:00:00-04'),
            time_stuff_default DEFAULT

Now you can see the two partitions older than 2 days ago have been removed and two new partitions have been created to include 6 days ahead. There are some other more advanced options for retention available in pg_partman as well. You’ll see above that the retention_keep_table option is set to true by default. This means that while the child tables are no longer part of the retention set, those tables do still exist in the database. pg_partman tries to keep all default options set in a manner to reduce accidental data loss.

\dt public.time_stuff*
                     List of relations
 Schema |         Name         |       Type        | Owner
--------+----------------------+-------------------+-------
 public | time_stuff           | partitioned table | keith
 public | time_stuff_default   | table             | keith
 public | time_stuff_p20240408 | table             | keith
 public | time_stuff_p20240409 | table             | keith
 public | time_stuff_p20240410 | table             | keith
 public | time_stuff_p20240411 | table             | keith
 public | time_stuff_p20240412 | table             | keith
 public | time_stuff_p20240413 | table             | keith
 public | time_stuff_p20240414 | table             | keith
 public | time_stuff_p20240415 | table             | keith
 public | time_stuff_p20240416 | table             | keith
 public | time_stuff_p20240417 | table             | keith
 public | time_stuff_p20240418 | table             | keith

Dropping tables and indexes

If you’d like these tables to actually be dropped, you can set the retention_keep_table to false. Or if you’d like to keep the tables live in the database, but don’t need the indexes taking up space anymore, you can leave retention_keep_table set to true, but set retention_keep_index false instead. In the example below, I have reset the partition set back to its original state after running create_parent() and then running this update.

UPDATE partman.part_config
SET retention = '2 days', premake = 6, retention_keep_table = false
WHERE parent_table = 'public.time_stuff';

CALL partman.run_maintenance_proc();

Now if we look at the tables that actually exist, we can see the oldest two tables are gone.

\dt public.time*
                     List of relations
 Schema |         Name         |       Type        | Owner
--------+----------------------+-------------------+-------
 public | time_stuff           | partitioned table | keith
 public | time_stuff_default   | table             | keith
 public | time_stuff_p20240410 | table             | keith
 public | time_stuff_p20240411 | table             | keith
 public | time_stuff_p20240412 | table             | keith
 public | time_stuff_p20240413 | table             | keith
 public | time_stuff_p20240414 | table             | keith
 public | time_stuff_p20240415 | table             | keith
 public | time_stuff_p20240416 | table             | keith
 public | time_stuff_p20240417 | table             | keith
 public | time_stuff_p20240418 | table             | keith

Retention outside the database

Another scenario is if you don’t need the data live in the database, but you still want to keep a backup of it outside of the database. In this case, we’re going to use the retention_schema option which detaches the child tables from the partition set and then moves them to the schema named in this option. Again, the partition set has been reset to the initial state after create_parent() and then we run this:

CREATE SCHEMA old_tables;

UPDATE partman.part_config
SET retention = '2 days', retention_schema = 'old_tables'
WHERE parent_table = 'public.time_stuff';

CALL partman.run_maintenance_proc();

Now we can see that the old tables are no longer in the partition set, but are now in the old_tables schema.

\d+ time_stuff
                                             Partitioned table "public.time_stuff"
 Column |           Type           | Collation | Nullable |    Default    | Storage  | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------------+----------+-------------+--------------+-------------
 col1   | integer                  |           |          |               | plain    |             |              |
 col2   | text                     |           |          | 'stuff'::text | extended |             |              |
 col3   | timestamp with time zone |           | not null | now()         | plain    |             |              |
Partition key: RANGE (col3)
Partitions: time_stuff_p20240410 FOR VALUES FROM ('2024-04-10 00:00:00-04') TO ('2024-04-11 00:00:00-04'),
            time_stuff_p20240411 FOR VALUES FROM ('2024-04-11 00:00:00-04') TO ('2024-04-12 00:00:00-04'),
            time_stuff_p20240412 FOR VALUES FROM ('2024-04-12 00:00:00-04') TO ('2024-04-13 00:00:00-04'),
            time_stuff_p20240413 FOR VALUES FROM ('2024-04-13 00:00:00-04') TO ('2024-04-14 00:00:00-04'),
            time_stuff_p20240414 FOR VALUES FROM ('2024-04-14 00:00:00-04') TO ('2024-04-15 00:00:00-04'),
            time_stuff_p20240415 FOR VALUES FROM ('2024-04-15 00:00:00-04') TO ('2024-04-16 00:00:00-04'),
            time_stuff_p20240416 FOR VALUES FROM ('2024-04-16 00:00:00-04') TO ('2024-04-17 00:00:00-04'),
            time_stuff_default DEFAULT

\dt old_tables.*
                 List of relations
   Schema   |         Name         | Type  | Owner
------------+----------------------+-------+-------
 old_tables | time_stuff_p20240408 | table | keith
 old_tables | time_stuff_p20240409 | table | keith

To store these tables “offline” outside of the database, we can use a python script provided by pg_partman to dump all tables in a given schema. It’s not tied in any way to the partition configuration or the partition set, so this script can be used to dump any tables in any schema.

$ python3 dump_partition.py -c"host=localhost" --schema=old_tables
DROP TABLE IF EXISTS"old_tables"."time_stuff_p20240409"
DROP TABLE IF EXISTS"old_tables"."time_stuff_p20240408

$ ls -l old*
-rw-rw-r-- 1 keith keith  168 Apr 12 18:17 old_tables.time_stuff_p20240408.hash
-rw-rw-r-- 1 keith keith 1410 Apr 12 18:17 old_tables.time_stuff_p20240408.pgdump
-rw-rw-r-- 1 keith keith  168 Apr 12 18:17 old_tables.time_stuff_p20240409.hash
-rw-rw-r-- 1 keith keith 1410 Apr 12 18:17 old_tables.time_stuff_p20240409.pgdump

By default it creates dump files in the custom dump format as well as providing a SHA-512 hash of the dump file to provide long-term data integrity checks. This backup option can either be run as part of a regularly scheduled script or as a one off backup.

Summary

Keeping data that doesn’t need to actually exist inside the database is a key part of keeping it running efficiently. Hopefully this has provided a guide to using both basic and advanced retention management options available in pg_partman.