CrunchyData Blog

JSON Updates in Postgres 16

Christopher.Winslett@crunchydata.com (Christopher Winslett) — Wed, 20 Sep 2023 09:00:00 EDT

Postgres has been steadily building on the JSON functionality initially released more than 10 years ago. With Postgres 16, working with JSON has gotten a couple nice improvements. Primarily, this release added features that ease the manipulation of data into JSON and improve the standard SQL functionality using JSON.

TL;DR:

A SQL/JSON data-type check. For instance, this lets you ask with SQL if something value IS JSON OBJECT
Addition of SQL-standard JSON functions: JSON_ARRAY(), JSON_ARRAYAGG(), JSON_OBJECT(), and JSON_OBJECTAGG()

Data for this post

For a quick tutorial, use the following dataset:

CREATE TABLE user_profiles ( id serial PRIMARY KEY, name text NOT NULL, details jsonb );
INSERT INTO
   user_profiles (name, details)
VALUES
   (
      'John Doe', '{"age": 30, "address": {"city": "New York", "state": "NY"}, "hobbies": ["reading", "hiking"]}'::jsonb
   )
,
   (
      'Jane Smith', '{"age": 25, "address": {"city": "Los Angeles", "state": "CA"}, "hobbies": ["painting", "cooking"]}'::jsonb
   )
,
   (
      'Emily Davis', '{"age": 29, "address": {"city": "Chicago", "state": "IL"}, "hobbies": ["dancing", "writing"]}'::jsonb
   )
;

`IS JSON`

Previously, to test if a field was JSON, your options were to run pg_typeof:

SELECT
   details,
   pg_typeof(details),
   pg_typeof(details ->> 'address')
FROM
   user_profiles LIMIT 1;

Which would return jsonb for that second column, and text for the 3rd column. The problem with this is that it didn’t give you any inspection into the nested values within JSON, so running pg_typeof(details->>'address') would just tell you it is text. Now, we can do something like:

SELECT
   details,
   details IS JSON,
   details ->> 'address' IS JSON OBJECT
FROM
   user_profiles LIMIT 1;

Using this new functionality, the IS JSON returns true and the IS JSON OBJECT returns true as well. Previously, when building queries to inspect JSON, you were at risk of failed queries if the data’s JSON format did not match expected values. Imagine you are migrating your JSON structure, or you previously didn’t have a well defined schema, so let’s throw in some bad details data that mismatches our current structure:

INSERT INTO
   user_profiles (name, details)
VALUES
   (
      'Bruce Wayne', '"Gotham City, NY"'
   )
;
INSERT INTO
   user_profiles (name, details)
VALUES
   (
      'Clark J. Kent', '{"age": 32, "address": [{"city": "New York", "state": "AL"}]}'
   )
;

Both of the values provided above are valid JSON, yet the existing structure was of the format {"addresses": []}. Now, when extracting the first address for a field, it would look like this:

SELECT
   CASE
      WHEN
         details -> 'address' IS JSON ARRAY
      THEN
(details -> 'address')[0]
      WHEN
         details -> 'address' IS JSON OBJECT
      THEN
         details -> 'address'
      WHEN
         details IS JSON SCALAR
      THEN
         details
   END
   AS primary_address
FROM
   user_profiles;

Additionally, because JSON is loosely validated, you can inspect UNIQUE keys or WITHOUT UNIQUE keys:

SELECT
	'{"batman": "robin", "batman": "robin"}' IS JSON WITH UNIQUE KEYS,
	'{"batman": "robin", "batman": "robin"}' IS JSON WITHOUT UNIQUE KEYS;

`JSON_ARRAY` & `JSON_ARRAYAGG`

Now, we can interact with JSON in a more-standards compliant way. The json and jsonb constructs are a Postgres implementation, but not a SQL implementation. Thus, the existing Postgres functions were prefixed with jsonb_ or json. These new functions get away from those prefixes, and use keywords to specify output.

Let’s look at json_array. If you have previously done this, you used json_build_array / jsonb_build_array, or the aggregate functions json_agg / jsonb_agg. Now, you can combine values using json_array:

SELECT
   json_array(name, details ->> 'age')
FROM
   user_profiles;

Or, use it to build an array on a GROUP BY using json_arrayagg. Below, we write a SQL query that returns a JSON array of each person within each age group.

SELECT
((details ->> 'age')::integer / 10) * 10 AS age_group,
   json_arrayagg(name)
FROM
   user_profiles
GROUP BY
   1;

We can also use json_arrayagg in window functions. Below, we use a window function to find the other group members for each age group.

SELECT
   name,
   (
(details ->> 'age')::integer / 10
   )
   * 10 AS age_group,
   json_arrayagg(name) OVER (PARTITION BY (details ->> 'age')::integer / 10 * 10) AS other_group_members
FROM
   user_profiles;

Disclaimer: I would write this differently using a dedicated CTE and GROUP BY, but this is an example of using json_arrayagg as a window function using the small dataset.

`JSON_OBJECTAGG`

Just as we can aggregate values into an array, we can also aggregate values into an object. Previously, when using json(b)?_build_object or json(b)?_object_agg to construct objects. With Postgres 16, we have a standard SQL function called json_object with a syntax that moved away from tuples, and uses either value or : to separate key-value pairs.

SELECT
   json_object('name' value name, 'age': details ->> 'age')
FROM
   user_profiles;

The aggregations form is as follows:

SELECT
((details ->> 'age')::integer / 10) * 10 AS age_group,
   json_objectagg(name value details ->> 'age')
FROM
   user_profiles
GROUP BY
   1;

As with the json_arrayagg, the json_objectagg can be used in window functions:

SELECT
   name,
   (
(details ->> 'age')::integer / 10
   )
   * 10 AS age_group,
   json_objectagg(name value details ->> 'age') OVER (PARTITION BY (details ->> 'age')::integer / 10 * 10) AS other_group_members
FROM
   user_profiles;

Modifying behavior with keywords

For both json_array and json_object, they come with modifying keywords. json_array gets ABSENT and RETURNING. json_object gets ABSENT, UNIQUE, and RETURNING.

`UNIQUE`

Unique constraint on json_object will throw errors on duplicate keys, but by default, it will not error. Try the following:

SELECT
   json_object('key_1' value 'value_1', 'key_1' value 'value_2' WITH UNIQUE);
SELECT
   json_object('key_1' value 'value_1', 'key_1' value 'value_2');

`ABSENT ON NULL` v. `NULL ON NULL`

Absent, or more precisely ABSENT ON NULL constraint instructs json_object and json_array to omit null values:

SELECT
   json_object('key_1' value 'value_1', 'key_2' value NULL, 'key_3' value 'value_3' ABSENT
   ON NULL);
SELECT
   json_object('key_1' value 'value_1', 'key_2' value NULL, 'key_3' value 'value_3' NULL
   ON NULL);

Absent also works on json_array:

SELECT
   json_array('value_1', NULL, 'value_3' ABSENT
   ON NULL);
SELECT
   json_array('value_1', NULL, 'value_3' NULL
   ON NULL);

The opposing functionality of ABSENT ON NULL is the NULL ON NULL as above. This means the following is a valid SQL statement in Postgres 16:

SELECT
   json_array(NULL NULL
   ON NULL);

`RETURNING`

Have you wondered why we didn’t have a jsonb variant of this json_object function? It’s because the output is dictated by a keyword instead of the function name. Using RETURNING to specify the returned object type:

SELECT
   json_object('key_1' value 'value_1', 'key_2' value NULL, 'key_3' value 'value_3' RETURNING jsonb);
SELECT
   json_object('key_1' value 'value_1', 'key_2' value NULL, 'key_3' value 'value_3' RETURNING json);
SELECT
   json_array('value_1', NULL, 'value_3' RETURNING jsonb);
SELECT
   json_array('value_1', NULL, 'value_3' RETURNING json);

Using the pg_typeof function, you can see the different data types that are returned by each statement above.

SELECT
   pg_typeof(json_array('value_1', NULL, 'value_3' RETURNING jsonb));

Below is an example of using all 3 of the keywords together:

SELECT json_object(
	'key_1' value 'value_1',
	'key_2' value NULL,
	'key_3' value 'value_3'
	ABSENT ON NULL WITH UNIQUE RETURNING jsonb);

This functionality is added to the existing functions as suffixes. The function json_object_agg received a strict, unique, and a unique_strict suffix as follows:

json_object_agg_strict / jsonb_object_agg_strict
json_object_agg_unique / jsonb_object_agg_unique
json_object_agg_unique_strict / jsonb_object_agg_unique_strict

For instance, using jsonb_object_agg_unique_strict would look like the following:

SELECT
	json_object_agg_unique_strict(key, value)
FROM (VALUES
		('key_1', 'value_1'),
		('key_2', NULL),
		('key_3', 'value_3')
	) AS t(key, value);

The unique suffix specifies to throw an error on duplicate keys, and strict suffix skips null values. Of course, combining unique_strict suffixes specifies both. Given that I appreciate Postgres’ pursuit of the SQL standards, I would lean toward using new JSON_OBJECT and JSON_ARRAY functions + their keywords.

Postgres is a first-class JSON database

As we’ve said before, Postgres 16 is a first-class database for JSON (read Craig’s post on the evolution of JSON and JSONB). Given JSON was first introduced in 9.4 and now we are on version 16, the continued rollout of JSON improvement means the capabilities are thriving in the Postgres ecosystem. We know lots of folks are using JSON in Postgres and we’re happy to see several hands on and under the hood improvements.

Writing Data from Postgres to MongoDB

Christopher.Winslett@crunchydata.com (Christopher Winslett) — Mon, 17 Apr 2023 09:00:00 EDT

"Last time I had data like this, I stored the data in MongoDB." --Me, last week.

I told this to a friend while talking through some of their data problems. As Craig likes to say, Crunchy Data is "Just Postgres", but we also know there are some clear cut use cases where Postgres isn’t the perfect tool for the job. Don’t get us wrong, Postgres is an amazing database and we strongly believe what you should start with for most applications, but that doesn’t mean you can’t benefit from more than one database.

If you missed the announcement recently, we launched support for the Mongo Foreign Data Wrapper on Crunchy Bridge. If you’re wondering why? Well let’s look at a practical use case.

The Use Case

For that use case, we used Postgres to back purchase data for carbon removal projects, and MongoDB to serve the public presentation of that data. The data flow looks something like below.

In SQL, the data for this purchase was distributed across a few tables, so the query required a multiple joins across tables called purchases, line items, credits, and invoices. But, once a purchase was completed, it never changed. Thus, we would take the output of the purchase query, package it up into JSON, and store it in MongoDB in a single document. Then, we handed a receipt token back to the end-user to fetch the purchase data later.

Why use two different databases?

By targeting different use cases to different databases, you can maintain some flexibility in how you work. You can target the fast, complex data to a database with more processing power.

The Code

Below, we’ll walk through using a trigger in Postgres to write the code from Postgres to MongoDB using the MongoDB FDW.

To get started, you’ll need a Crunchy Bridge Postgres database and a MongoDB Atlas database. I’ll reference each of them below.

We have some SQL data and structure files here. Load the structure, then the data files, and then this tutorial will show how to flow data from Postgres to MongoDB.

Load the Postgres data structures

Below are some tables we’ll use to provide a tangible example of the capabilities.

CREATE TABLE public.accounts (
    id SERIAL PRIMARY KEY,
    name character varying,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL
);

CREATE TABLE public.carbon_removal_projects (
    id SERIAL PRIMARY KEY,
    name character varying,
    description text,
    originating_organization character varying,
    registry_name character varying,
    registry_id character varying,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL
);

CREATE TABLE public.purchases (
    id SERIAL PRIMARY KEY,
    account_id integer,
    tranche_id integer,
    amount_in_cents integer,
    amount_of_carbon_in_kg integer,
    public_url_token character varying,
		synced_to_mongo_at timestamp(6) without time zone,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL
);

CREATE TABLE public.tranches (
    id SERIAL PRIMARY KEY,
    carbon_removal_project_id integer,
    vintage character varying,
    cost_per_kg_in_cents integer,
    carbon_available_in_kg integer,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL
);

To piece together the tables into conceptual space, below is a diagram with explanations:

Carbon Removal Projects are created that contain information about the project, such as location, technology used, and the organization performing the removal.
Tranches are created that represent blocks of carbon removed from the atmosphere. So, last month they may have removed 250kg, and the prior months 250kg and 50kg respectively.
An account performs a purchase of a certain amount of carbon removal, which gets matched with a tranche and a carbon removal project. With this function, we can never have the amount of carbon purchased from a tranche greater than the amount of carbon removed by the tranche.

Create the query to generate JSON

With the above schema, we’ll want to join the purchases, tranches, and carbon_removal_projects data into a reasonable JSON object. Below we use some of Postgres’ JSON builder functions to create the object:

SELECT
   purchases.public_url_token,
   json_build_object(
     'account', json_build_object('name', accounts.name),
     'project', row_to_json(carbon_removal_projects.*),
		 'tranche', row_to_json(tranches.*),
     'carbon_offset_in_kg', purchases.amount_of_carbon_in_kg
   )
FROM purchases
	INNER JOIN accounts ON purchases.account_id = accounts.id
	INNER JOIN tranches ON purchases.tranche_id = tranches.id
	INNER JOIN carbon_removal_projects ON tranches.carbon_removal_project_id = carbon_removal_projects.id
WHERE purchases.id = 1;

Run this query against that dataset, and you’ll see it builds a JSON object. We’ll write that JSON object to MongoDB.

Create foreign table

Then, we’ll configure our Mongo foreign data table that allows us to write from Postgres to MongoDB. The following code assumes that you have already run CREATE EXTENSION and CREATE SERVER and CREATE USER MAPPING to configure the MongoFDW. For a tutorial defining connecting to Mongo, checkout the first post in this series.

-- see link above for connection information if this command fails

CREATE FOREIGN TABLE mongo_purchases
	(
		_id name,
		public_url_token character varying,
		purchase_data json, -- jsonb not allowed
	  created_at timestamp
	)
	SERVER atlas_server
	OPTIONS (database 'carbon_removal_purchases', collection 'purchases');

Create a trigger

On insert, we’ll package up a document and write it to the mongo_purchases table. We’ll use the above query to write it to the foreign table:

CREATE OR REPLACE FUNCTION purchases_insert_trigger()
  RETURNS TRIGGER AS $$
BEGIN
  -- Insert the new row into the mongo_purchases table
  INSERT INTO mongo_purchases (public_url_token, purchase_data, created_at)
  VALUES (NEW.public_url_token,
	json_build_object(
     'account', json_build_object('name', (SELECT name FROM accounts WHERE accounts.id = NEW.account_id LIMIT 1)),
     'project', (SELECT row_to_json(carbon_removal_projects.*) FROM carbon_removal_projects INNER JOIN tranches ON tranches.carbon_removal_project_id = carbon_removal_projects.id WHERE tranches.id = NEW.tranche_id LIMIT 1),
		 'tranche', (SELECT row_to_json(tranches.*) FROM tranches WHERE tranches.id = NEW.tranche_id LIMIT 1),
     'carbon_offset_in_kg', NEW.amount_of_carbon_in_kg
  ),
	NEW.created_at);

	UPDATE purchases SET synced_to_mongo_at = now() WHERE id = NEW.id;

  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER purchases_insert
  AFTER INSERT ON purchases
  FOR EACH ROW
  EXECUTE FUNCTION purchases_insert_trigger();

This function builds the JSON object based on the newly inserted purchases row. Then, it writes that object to the mongo_purchases table, which actually resides in the Mongo Atlas database.

At the end, it sets a timestamp for synced to Mongo. This allows us to sync all existing records to Mongo with the following instructions!

Sync existing data

We have a way to sync all new purchases rows, but what do we need to do for the existing rows? Be sure to load the the data.sql file from above, it’ll give you some real data to play with. Run the following to insert existing rows:

WITH unsynced_purchases AS ( -- #1
		UPDATE purchases
    SET synced_to_mongo_at = now()
    WHERE synced_to_mongo_at < updated_at
			OR synced_to_mongo_at IS NULL
		RETURNING id
)

INSERT INTO mongo_purchases (public_url_token, purchase_data, created_at)  -- #2
	SELECT  -- #3
	   purchases.public_url_token,
	   json_build_object(   -- #4
	     'account', json_build_object('name', accounts.name),
	     'project', row_to_json(carbon_removal_projects.*),
			 'tranche', row_to_json(tranches.*),
	     'carbon_offset_in_kg', purchases.amount_of_carbon_in_kg
	   ),
		 purchases.created_at
	FROM purchases
		INNER JOIN unsynced_purchases ON purchases.id = unsynced_purchases.id -- #4
		INNER JOIN accounts ON purchases.account_id = accounts.id
		INNER JOIN tranches ON purchases.tranche_id = tranches.id
		INNER JOIN carbon_removal_projects ON tranches.carbon_removal_project_id = carbon_removal_projects.id;

What is this magic? I’m quite proud of this query, so let me unwind it for you based on the numbers notated above:

UPDATE purchases that have not synced since being updated, and return the ids
Define an INSERT statement for the values returned by
Use the SELECT statement that we built previously
Filter by the unsynched_purchases as defined in the CTE

Insert data and watch it flow

Run the following in your Mongo, and you’ll see the latest purchase data:

db.purchases.find().sort({created_at: -1}).limit(1)

If you followed the tutorial above, then you should see the same in Postgres, when you run. This proves that our data synced correctly. If you run any count queries, you’ll see that also matches.

SELECT * FROM purchases ORDER BY created_at DESC LIMIT 1;

Now, to test the insert, run the following:

INSERT INTO purchases (
	account_id,
	tranche_id,
	amount_in_cents,
	amount_of_carbon_in_kg,
	public_url_token,
	created_at,
	updated_at
) VALUES (
	1,
	1,
	50,
	0.2,
	gen_random_uuid(),
	now(),
	now()
) returning *;

Then, when querying MongoDB, you’ll see the new document:

db.purchases.find().sort({created_at: -1}).limit(1)

Summary

Using different databases can give you flexibility and agility with your data strategy.
In a multi-database environment, you can use a relational database and a NoSQL one, each for their specific use case
With a tool like the Postgres MongoDB Foreign Data Wrapper, you can use easily sync data from Postgres to MongoDB
Combining the Foreign Data Wrapper with JSON creation and Postgres Triggers will allow you to keep both data stores in sync

Easy Mongo from your Postgres

Christopher.Winslett@crunchydata.com (Christopher Winslett) — Fri, 31 Mar 2023 09:00:00 EDT

MongoDB got an early start as a developer friendly database. Mongo was trivial to get started with and has some good tooling for scaling. Since you didn't have to create a schema, you were off and running in no time. Don't get us wrong, we do firmly believe a well designed schema is important. If you're not managing your schema it's managing you. Because of these capabilities, MongoDB has been used by teams for:

the typical, primary application database
supporting database for large, sharded JSON datasets
rolling log using MongoDB’s capped collections
rapidly updating JSON documents using MongoDB’s atomic updates

In the open-source database world we’re seeing a lot of customers working with both Mongo and Postgres. We’re excited to announce today that for all new Crunchy Bridge deployments, you will have access to the MongoDB FDW extension. Said another way, it is a foreign data wrapper that will represent data in MongoDB via Postgres.

You can run the following query in MongoDB:

db.purchases.find({ _id: ObjectId('6408fab340abb74a9d616083') })

And it will return the object in MongoDB. By connecting your Postgres to Mongo via the FDW, you can then run the following in your Postgres:

SELECT * FROM mongo_purchases WHERE _id = '6408fab340abb74a9d616083';

It doesn't have to be Mongo vs. Postgres. The two can be used in conjunction, in fact a number of our customers are adopting both. If you are using both, you shouldn't have to go through complicated pipelines and ETL to allow them to talk to each other, they can be friends. Let's dig in at connecting Crunchy Bridge, our fully managed Postgres-as-a-service and MongoDB Atlas, a fully managed MongoDB-as-a-service. One multi-cloud DBaaS meets another.

Getting Started

To get started, do the following:

Signup & create a Crunchy Bridge database, we have low cost plans available.
Connect to your MongoDB Atlas database, where we will need to get a few variables for configurations. First, we’ll need the replica set name, and we can get that by running as a user with MongoDB Atlas Admin privileges:
```
rs.status().set // = `replica-set` value used later in tutorial
```
Then, run the following to get a host and port that we will use later. It will be in the format <mongo-host>:<mongo-port>, which we’ll use below:
```
rs.status().members.forEach(function (m) {
	print(m.name)
})
```
Choose any of the hosts listed — it does not matter. The MongoDB driver built into MongoFDW will automatically connect to the primary during writes.

Connect to your Crunchy Bridge database as the postgres user, and run the following:

CREATE EXTENSION mongo_fdw;

-- create the mongodb server
CREATE SERVER atlas_server
	FOREIGN DATA WRAPPER mongo_fdw
	OPTIONS (
		address '<mongo-host>', -- from step #2
		port '<mongo-port>', -- from step #2
		ssl 'true',
		replica_set '<replica-set>', -- from step #2
		authentication_database 'admin',
		weak_cert_validation 'true'
	);

-- create user
CREATE USER MAPPING FOR postgres
	SERVER atlas_server
	OPTIONS (username '<your-mongodb-username>', password '<your-mongodb-password>');

-- connect tables
CREATE FOREIGN TABLE mongo_superheroes
	(
		_id name,
		name text,
		data json, -- jsonb not allowed
	  created_at timestamp
	)
	SERVER atlas_server
	OPTIONS (database 'superhero-database', collection 'superheroes');

If you run into errors, double check the host, port, replica set, username, and password are correct. If those are correct, check that your MongoDB Atlas firewall permits connecting from the Postgres host.

Working with data

Once you have configured MongoFDW, you’ll be able to work with the MongoDB data. Inserts are as native as you can imagine:

INSERT INTO mongo_superheroes (name, data, created_at) VALUES ('batman', '{"superpowers": ["moral compass", "innovative"]}', now());

Then, we can query data from that table:

SELECT * FROM mongo_superheroes;

Which returns something like the following.

           _id            |  name  |                         data                          |       created_at
--------------------------+--------+-------------------------------------------------------+-------------------------
 640a40334d7edb1a9921b2f2 | batman | { "superpowers" : [ "moral compass", "innovative" ] } | 2023-03-30 20:23:14.974

Now, you can log into your MongoDB, and run the following from that superhero-database:

db.superheroes.find()

which returns:

[
  {
    _id: ObjectId("640a40334d7edb1a9921b2f2"),
    name: 'batman',
    data: { superpowers: [ 'moral compass', 'innovative' ] },
    created_at: ISODate("2023-03-30T20:23:14.974Z")
  }
]

Go forth and conquer

So there you have it - a developer friendly way to merge MongoDB and Postgres into a truly hybrid data structure. We can see it being used for ETL processes that sync data from a MongoDB application to a Postgres warehouse for analysis. We can also see where you might want to write data to MongoDB directly from Postgres. We are looking forward to what you all end up doing with this new foreign data wrapper.

Postgres support for JSON is 10 years old!

Christopher.Winslett@crunchydata.com (Christopher Winslett) — Fri, 09 Dec 2022 10:00:00 EST

JSON is everywhere, even in your SQL database. Since Postgres added JSON support 10 years ago, we've seen widespread adoption. When JSON support was first released in 2012 the implementation was a fairly flat representation of the JSON data type. It had limited querying and indexing capabilities. Craig, who was active in early JSON discussions with Postgres contributors, admits "Postgres cheated" with these early implementations. As JSONB has been introduced and adopted, I think you can see that JSON use in Postgres is now one of its biggest strengths.

Crunchy's JSON Tutorial

If you're new to JSON in Postgres or just want to play around, jump over to our JSON + Postgres tutorial and give it a spin.

JSONB

Today, when creating a column for JSON data in Postgres, nearly all use-cases call for the JSONB data type. JSONB stores data in a binary (B), binary form. JSONB has been fundamental in improving the Postgres-JSON experience. With it, Postgres has functions for modifying, extracting, and re-forming JSON data.

I'm going to go out on a limb and say that Postgres has as much support for JSON as any other database vendor. Yet, accessing that JSONB functionality in Postgres is not as straight forward. With most scenarios, there is more than one method to get the intended result. JSONB queries are a combination of traditional SQL syntax, JSONB, and JSONPath queries. Three people may get the same result, yet they use completely different set of functions.

For instance: assuming querying a nested set of objects, a programmer can either

Turn those into a set of records using jsonb_to_recordset, then filter the records based on conditionals, then rejoin to JSON using to_jsonb
Use the JSONPath to query and compare the objects
Use JSON operators to extractand compare.

So, how can you use Postgres with JSONB while keeping your sanity?

JSON Storage

The good news: With Postgres, JSONB documents can be as large as 255MB in size. JSONB documents can store any structure and store different structures on different records. Four layer deep arrays and objects? No problem!

The practical news: Read only JSONB blobs can be a wild west of size and format. But, if you want to query JSONB documents, they should not be that large. Consistent, simple, well-defined JSON schema rewards a developer with performance and readable queries.

JSON Queries & Filtering

The good news: Postgres functions can extract values from inside of a JSON blob. These functions access and transform objects, arrays, and values. Postgres has JSONPath conditions for extracting with precision.

The practical news: Query statements filtering values for a single field is easy to read. Single conditionals mean the mental buffer does not overflow. Additional conditionals increase the verbosity of the functions, which become hard-to-read statements.

Once values are extracted from JSONB, the values are returned as strings. Being strongly-typed, Postgres will need those values cast for comparison. Thus, JSONB conditionals are encased in type casting statements.

JSON filtering in Postgres can start to feel complex or unreadable. But, it's on-par with other databases in the space. An expressive query is required for an equally expressive JSON document. Keep documents simple, and the queries will follow.

JSON Query Performance

The good news: Postgres has indexes for all data structures found within JSONB.

The practical news: Simple indexes perform reliably. BTREE are the most predcitable and resource effective indexes. When using BTREE, index creation integrates the JSON extraction and transformation. Postgres optimizes for these explicit values, and the human query builder understands the indexes. BTREE will perform as you would expect for any index on Postgres.

For more expansive queries that may query any of the object's keys, GIN indexes are an option. GIN indexes can be powerful, yet they should not be thrown on every JSONB field (this is never a good performance strategy for any type of data, but particularly GIN indexes). GIN indexes have caveats for which operators work with different index settings. When using GIN indexes, set aside extensive time for testing each time a query needs to use it.

How do Postgres' indexes compare to other JSON functionality?

Key-value store databases limit the number of indexes, thus focusing performance on a limited number of queries, and preventing index sprawl. This strategy prevents database users from sabotaging their own experience. Search specific indexes throw computer resources it, thus you'll use CPU + RAM + money. General application databases trust the programmer to use the provided toolset. Postgres is in this category -- it is up to the user to use the tool wisely.

JSON Modifications & JSON Upserts

The good news: With Postgres, a single update statement can update or upsert values in a JSONB document. Postgres provides the ability to do in-place updates with JSONB.

The practical news: With Postgres, inplace-updates are similar to any update. The query contains the method of building a new document. As we said about queries, complex JSON documents require complex in-place update queries. Still, it may be worth it. Inplace-updates avoid the network round-trip, which rewards performance.

Using Postgres with JSON Practically

Today, anything is possible with JSON with Postgres. Complex situations can create SQL spaghetti. A developer choosing complex JSON will pay the price for it on code-complexity. Yet, I would argue this is the case with any other JSON database.

The words of advice for Postgres + JSON IN 2022: if you feel like you are struggling to make JSON work with Postgres, consider using a table.

Want to try it out? Checkout the JSON tutorial in the Crunchy Data Postgres playground.

Cut Out the Middle Tier: Generating JSON Directly from Postgres

Paul.Ramsey@crunchydata.com (Paul Ramsey) — Wed, 14 Jul 2021 05:00:00 EDT

Too often, web tiers are full of boilerplate that does nothing except convert a result set into JSON. A middle tier could be as simple as a function call that returns JSON. All we need is an easy way to convert result sets into JSON in the database.

PostgreSQL has built-in JSON generators that can be used to create structured JSON output right in the database, upping performance and radically simplifying web tiers.

Fortunately, PostgreSQL has such functions, that run right next to the data, for better performance and lower bandwidth usage.

Some example data

To try out these examples, load this tiny database:

CREATE TABLE employees (
  employee_id serial primary key,
  department_id integer references departments(department_id),
  name text,
  start_date date,
  fingers integer,
  geom geometry(point, 4326)
  );

CREATE TABLE departments (
  department_id bigint primary key,
  name text
  );

INSERT INTO departments
 (department_id, name)
VALUES
 (1, 'spatial'),
 (2, 'cloud');

INSERT INTO employees
 (department_id, name, start_date, fingers, geom)
VALUES
 (1, 'Paul',   '2018/09/02', 10, 'POINT(-123.32977 48.40732)'),
 (1, 'Martin', '2019/09/02',  9, 'POINT(-123.32977 48.40732)'),
 (2, 'Craig',  '2019/11/01', 10, 'POINT(-122.33207 47.60621)'),
 (2, 'Dan',    '2020/10/01',  8, 'POINT(-122.33207 47.60621)');

Four employees, arranged into two departments, with some detail information about each employee.

Easy JSON using row_to_json

The simplest JSON generator is row_to_json() which takes in a tuple value and returns the equivalent JSON dictionary.

SELECT row_to_json(employees)
FROM employees
WHERE employee_id = 1;

The resulting JSON uses the column names for keys, so you get a neat dictionary.

{
  "employee_id": 1,
  "department_id": 1,
  "name": "Paul",
  "start_date": "2018-09-02",
  "fingers": 10,
  "geom": {
    "type": "Point",
    "coordinates": [
      -123.329773,
      48.407326
    ]
  }
}

And look what happens to the geometry column! Because PostGIS includes a cast from geometry to JSON, the geometry column is automatically mapped into GeoJSON in the conversion. This is a useful trick with any custom type: define a cast to JSON and you automatically integrate with the native PostgreSQL JSON generators.

Full result sets using json_agg

Turning a single row into a dictionary is fine for basic record access, but queries frequently require multiple rows to be converted.

Fortunately, there's an aggregate function for that, json_agg, which carries out the JSON conversion and converts the multiple results into a JSON list.

SELECT json_agg(e)
FROM (
    SELECT employee_id, name
    FROM employees
   WHERE department_id = 1
    ) e;

Note that in order to strip down the data in the record, we use a subquery to make a narrower input to json_agg.

[
  {
    "employee_id": 1,
    "name": "Paul"
  },
  {
    "employee_id": 2,
    "name": "Martin"
  }
]

Nested results using subqueries

So far, all this is pretty easy to replicate in middleware, but things get more interesting when you start dumping structured results.

Using aggregation, and converting the results to JSON in stages, it's possible to build up nested JSON outputs that reflect table relationships.

WITH
-- strip down employees table
employees AS (
  SELECT department_id, name, start_date
  FROM employees
),
-- join to departments table and aggregate
departments AS (
  SELECT d.name AS department_name,
         json_agg(e) AS employees
 FROM departments d
  JOIN employees e
  USING (department_id)
  GROUP BY d.name
)
-- output as one json list
SELECT json_agg(departments)
FROM departments;

And the result has one entry for each department, which each contains its two employees.

[
	{
		"department_name": "cloud",
		"employees": [
			{
				"department_id": 2,
				"name": "Craig",
				"start_date": "2019-11-01"
			},
			{
				"department_id": 2,
				"name": "Dan",
				"start_date": "2020-10-01"
			}
		]
	},
	{
		"department_name": "spatial",
		"employees": [
			{
				"department_id": 1,
				"name": "Paul",
				"start_date": "2018-09-02"
			},
			{
				"department_id": 1,
				"name": "Martin",
				"start_date": "2019-09-02"
			}
		]
	}
]

If you would prefer your output to be an associative array instead of a list, replace the final json_agg with json_object_agg.

All your tables in JSON

Ever wanted to quickly extract a definition of your table structures from the database? With the JSON formatters and the PostgreSQL system tables, all that info is right at hand.

WITH rows AS (
  SELECT c.relname, a.attname, a.attnotnull, a.attnum, t.typname
  FROM pg_class c
  JOIN pg_attribute a
    ON c.oid = a.attrelid and a.attnum >= 0
 JOIN pg_type t
    ON t.oid = a.atttypid
  JOIN pg_namespace n
    ON c.relnamespace = n.oid
 WHERE n.nspname = 'public'
    AND c.relkind = 'r'
),
agg AS (
  SELECT rows.relname, json_agg(rows ORDER BY attnum) AS attrs
  FROM rows
 GROUP BY rows.relname
)
SELECT json_object_agg(agg.relname, agg.attrs)
FROM agg;

Here's the entry for the "departments" table.

{
  "departments": [
    {
      "relname": "departments",
      "attname": "department_id",
      "attnotnull": true,
      "attnum": 1,
      "typname": "int8"
    },
    {
      "relname": "departments",
      "attname": "name",
      "attnotnull": false,
      "attnum": 2,
      "typname": "text"
    }
  ],
  ...
}

Conclusion

PostgreSQL JSON emitters can turn any result set into JSON right in the database
Web tiers can be vastly simplified by pushing JSON creation further down the stack
Custom types can emit custom JSON if a cast to json is defined on them