<?xml version="1.0" encoding="UTF-8" ?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" version="2.0"><channel><title>CrunchyData Blog</title>
<atom:link href="https://www.crunchydata.com/blog/topic/json/rss.xml" rel="self" type="application/rss+xml" />
<link>https://www.crunchydata.com/blog/topic/json</link>
<image><url>https://www.crunchydata.com/card.png</url>
<title>CrunchyData Blog</title>
<link>https://www.crunchydata.com/blog/topic/json</link>
<width>800</width>
<height>419</height></image>
<description>PostgreSQL experts from Crunchy Data share advice, performance tips, and guides on successfully running PostgreSQL and Kubernetes solutions</description>
<language>en-us</language>
<pubDate>Wed, 20 Sep 2023 09:00:00 EDT</pubDate>
<dc:date>2023-09-20T13:00:00.000Z</dc:date>
<dc:language>en-us</dc:language>
<sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<item><title><![CDATA[ JSON Updates in Postgres 16 ]]></title>
<link>https://www.crunchydata.com/blog/json-updates-postgres-16</link>
<description><![CDATA[ Chris digs into some example code for the new JSON features that just launched with Postgres 16 last week. A new check for finding json objects and some new object and array functions. ]]></description>
<content:encoded><![CDATA[ <p>Postgres has been steadily building on the JSON functionality initially released more than <a href=https://www.crunchydata.com/blog/state-of-json-in-postgres-2022>10 years ago</a>. With Postgres 16, working with JSON has gotten a couple nice improvements. Primarily, this release added features that ease the manipulation of data into JSON and improve the standard SQL functionality using JSON.<p><strong>TL;DR:</strong><ul><li>A SQL/JSON data-type check. For instance, this lets you ask with SQL if something <code>value IS JSON OBJECT</code><li>Addition of SQL-standard JSON functions: <a href=https://www.postgresql.org/docs/16/functions-json.html#FUNCTIONS-JSON-CREATION-TABLE><code>JSON_ARRAY()</code></a>, <a href=https://www.postgresql.org/docs/16/functions-aggregate.html#FUNCTIONS-AGGREGATE-TABLE><code>JSON_ARRAYAGG()</code></a>, <code>JSON_OBJECT()</code>, and <code>JSON_OBJECTAGG()</code></ul><h2 id=data-for-this-post><a href=#data-for-this-post>Data for this post</a></h2><p>For a quick tutorial, use the following dataset:<pre><code class=language-pgsql>CREATE TABLE user_profiles ( id serial PRIMARY KEY, name text NOT NULL, details jsonb );
INSERT INTO
   user_profiles (name, details)
VALUES
   (
      'John Doe', '{"age": 30, "address": {"city": "New York", "state": "NY"}, "hobbies": ["reading", "hiking"]}'::jsonb
   )
,
   (
      'Jane Smith', '{"age": 25, "address": {"city": "Los Angeles", "state": "CA"}, "hobbies": ["painting", "cooking"]}'::jsonb
   )
,
   (
      'Emily Davis', '{"age": 29, "address": {"city": "Chicago", "state": "IL"}, "hobbies": ["dancing", "writing"]}'::jsonb
   )
;
</code></pre><h2 id=is-json><a href=#is-json><code>IS JSON</code></a></h2><p>Previously, to test if a field was JSON, your options were to run <code>pg_typeof</code>:<pre><code class=language-pgsql>SELECT
   details,
   pg_typeof(details),
   pg_typeof(details ->> 'address')
FROM
   user_profiles LIMIT 1;
</code></pre><p>Which would return <code>jsonb</code> for that second column, and <code>text</code> for the 3rd column. The problem with this is that it didn’t give you any inspection into the nested values within JSON, so running <code>pg_typeof(details->>'address')</code> would just tell you it is <code>text</code>. Now, we can do something like:<pre><code class=language-pgsql>SELECT
   details,
   details IS JSON,
   details ->> 'address' IS JSON OBJECT
FROM
   user_profiles LIMIT 1;
</code></pre><p>Using this new functionality, the <code>IS JSON</code> returns true and the <code>IS JSON OBJECT</code> returns true as well. Previously, when building queries to inspect JSON, you were at risk of failed queries if the data’s JSON format did not match expected values. Imagine you are migrating your JSON structure, or you previously didn’t have a well defined schema, so let’s throw in some bad details data that mismatches our current structure:<pre><code class=language-pgsql>INSERT INTO
   user_profiles (name, details)
VALUES
   (
      'Bruce Wayne', '"Gotham City, NY"'
   )
;
INSERT INTO
   user_profiles (name, details)
VALUES
   (
      'Clark J. Kent', '{"age": 32, "address": [{"city": "New York", "state": "AL"}]}'
   )
;
</code></pre><p>Both of the values provided above are valid JSON, yet the existing structure was of the format <code>{"addresses": []}</code>. Now, when extracting the first address for a field, it would look like this:<pre><code class=language-pgsql>SELECT
   CASE
      WHEN
         details -> 'address' IS JSON ARRAY
      THEN
(details -> 'address')[0]
      WHEN
         details -> 'address' IS JSON OBJECT
      THEN
         details -> 'address'
      WHEN
         details IS JSON SCALAR
      THEN
         details
   END
   AS primary_address
FROM
   user_profiles;
</code></pre><p>Additionally, because JSON is loosely validated, you can inspect UNIQUE keys or WITHOUT UNIQUE keys:<pre><code class=language-pgsql>SELECT
	'{"batman": "robin", "batman": "robin"}' IS JSON WITH UNIQUE KEYS,
	'{"batman": "robin", "batman": "robin"}' IS JSON WITHOUT UNIQUE KEYS;
</code></pre><h2 id=json_array--json_arrayagg><a href=#json_array--json_arrayagg><code>JSON_ARRAY</code> &#38 <code>JSON_ARRAYAGG</code></a></h2><p>Now, we can interact with JSON in a more-standards compliant way. The <code>json</code> and <code>jsonb</code> constructs are a Postgres implementation, but not a SQL implementation. Thus, the existing Postgres functions were prefixed with <code>jsonb_</code> or <code>json</code>. These new functions get away from those prefixes, and use keywords to specify output.<p>Let’s look at <code>json_array</code>. If you have previously done this, you used <code>json_build_array</code> / <code>jsonb_build_array</code>, or the aggregate functions <code>json_agg</code> / <code>jsonb_agg</code>. Now, you can combine values using <code>json_array</code>:<pre><code class=language-pgsql>SELECT
   json_array(name, details ->> 'age')
FROM
   user_profiles;
</code></pre><p>Or, use it to build an array on a <code>GROUP BY</code> using <code>json_arrayagg</code>. Below, we write a SQL query that returns a JSON array of each person within each age group.<pre><code class=language-pgsql>SELECT
((details ->> 'age')::integer / 10) * 10 AS age_group,
   json_arrayagg(name)
FROM
   user_profiles
GROUP BY
   1;
</code></pre><p>We can also use <code>json_arrayagg</code> in window functions. Below, we use a window function to find the other group members for each age group.<pre><code class=language-pgsql>SELECT
   name,
   (
(details ->> 'age')::integer / 10
   )
   * 10 AS age_group,
   json_arrayagg(name) OVER (PARTITION BY (details ->> 'age')::integer / 10 * 10) AS other_group_members
FROM
   user_profiles;
</code></pre><p><em>Disclaimer: I would write this differently using a dedicated CTE and <code>GROUP BY</code>, but this is an example of using</em> <code>json_arrayagg</code> <em>as a window function using the small dataset.</em><h2 id=json_objectagg><a href=#json_objectagg><code>JSON_OBJECTAGG</code></a></h2><p>Just as we can aggregate values into an array, we can also aggregate values into an object. Previously, when using <code>json(b)?_build_object</code> or <code>json(b)?_object_agg</code> to construct objects. With Postgres 16, we have a standard SQL function called <code>json_object</code> with a syntax that moved away from tuples, and uses either <code>value</code> or <code>:</code> to separate key-value pairs.<pre><code class=language-pgsql>SELECT
   json_object('name' value name, 'age': details ->> 'age')
FROM
   user_profiles;
</code></pre><p>The aggregations form is as follows:<pre><code class=language-pgsql>SELECT
((details ->> 'age')::integer / 10) * 10 AS age_group,
   json_objectagg(name value details ->> 'age')
FROM
   user_profiles
GROUP BY
   1;
</code></pre><p>As with the <code>json_arrayagg</code>, the <code>json_objectagg</code> can be used in window functions:<pre><code class=language-pgsql>SELECT
   name,
   (
(details ->> 'age')::integer / 10
   )
   * 10 AS age_group,
   json_objectagg(name value details ->> 'age') OVER (PARTITION BY (details ->> 'age')::integer / 10 * 10) AS other_group_members
FROM
   user_profiles;
</code></pre><h2 id=modifying-behavior-with-keywords><a href=#modifying-behavior-with-keywords>Modifying behavior with keywords</a></h2><p>For both <code>json_array</code> and <code>json_object</code>, they come with modifying keywords. <code>json_array</code> gets <code>ABSENT</code> and <code>RETURNING</code>. <code>json_object</code> gets <code>ABSENT</code>, <code>UNIQUE</code>, and <code>RETURNING</code>.<h3 id=unique><a href=#unique><code>UNIQUE</code></a></h3><p>Unique constraint on <code>json_object</code> will throw errors on duplicate keys, but by default, it will not error. Try the following:<pre><code class=language-pgsql>SELECT
   json_object('key_1' value 'value_1', 'key_1' value 'value_2' WITH UNIQUE);
SELECT
   json_object('key_1' value 'value_1', 'key_1' value 'value_2');
</code></pre><h3 id=absent-on-null-v-null-on-null><a href=#absent-on-null-v-null-on-null><code>ABSENT ON NULL</code> v. <code>NULL ON NULL</code></a></h3><p>Absent, or more precisely <code>ABSENT ON NULL</code> constraint instructs <code>json_object</code> and <code>json_array</code> to omit null values:<pre><code class=language-pgsql>SELECT
   json_object('key_1' value 'value_1', 'key_2' value NULL, 'key_3' value 'value_3' ABSENT
   ON NULL);
SELECT
   json_object('key_1' value 'value_1', 'key_2' value NULL, 'key_3' value 'value_3' NULL
   ON NULL);
</code></pre><p>Absent also works on <code>json_array</code>:<pre><code class=language-pgsql>SELECT
   json_array('value_1', NULL, 'value_3' ABSENT
   ON NULL);
SELECT
   json_array('value_1', NULL, 'value_3' NULL
   ON NULL);
</code></pre><p>The opposing functionality of <code>ABSENT ON NULL</code> is the <code>NULL ON NULL</code> as above. This means the following is a valid SQL statement in Postgres 16:<pre><code class=language-pgsql>SELECT
   json_array(NULL NULL
   ON NULL);
</code></pre><h3 id=returning><a href=#returning><code>RETURNING</code></a></h3><p>Have you wondered why we didn’t have a <code>jsonb</code> variant of this <code>json_object</code> function? It’s because the output is dictated by a keyword instead of the function name. Using <code>RETURNING</code> to specify the returned object type:<pre><code class=language-pgsql>SELECT
   json_object('key_1' value 'value_1', 'key_2' value NULL, 'key_3' value 'value_3' RETURNING jsonb);
SELECT
   json_object('key_1' value 'value_1', 'key_2' value NULL, 'key_3' value 'value_3' RETURNING json);
SELECT
   json_array('value_1', NULL, 'value_3' RETURNING jsonb);
SELECT
   json_array('value_1', NULL, 'value_3' RETURNING json);
</code></pre><p>Using the <code>pg_typeof</code> function, you can see the different data types that are returned by each statement above.<pre><code class=language-pgsql>SELECT
   pg_typeof(json_array('value_1', NULL, 'value_3' RETURNING jsonb));
</code></pre><p>Below is an example of using all 3 of the keywords together:<pre><code class=language-pgsql>SELECT json_object(
	'key_1' value 'value_1',
	'key_2' value NULL,
	'key_3' value 'value_3'
	ABSENT ON NULL WITH UNIQUE RETURNING jsonb);
</code></pre><p>This functionality is added to the existing functions as suffixes. The function <code>json_object_agg</code> received a <code>strict</code>, <code>unique</code>, and a <code>unique_strict</code> suffix as follows:<ul><li><code>json_object_agg_strict</code> / <code>jsonb_object_agg_strict</code><li><code>json_object_agg_unique</code> / <code>jsonb_object_agg_unique</code><li><code>json_object_agg_unique_strict</code> / <code>jsonb_object_agg_unique_strict</code></ul><p>For instance, using <code>jsonb_object_agg_unique_strict</code> would look like the following:<pre><code class=language-pgsql>SELECT
	json_object_agg_unique_strict(key, value)
FROM (VALUES
		('key_1', 'value_1'),
		('key_2', NULL),
		('key_3', 'value_3')
	) AS t(key, value);
</code></pre><p>The <code>unique</code> suffix specifies to throw an error on duplicate keys, and <code>strict</code> suffix skips null values. Of course, combining <code>unique_strict</code> suffixes specifies both. Given that I appreciate Postgres’ pursuit of the SQL standards, I would lean toward using new <code>JSON_OBJECT</code> and <code>JSON_ARRAY</code> functions + their keywords.<h2 id=postgres-is-a-first-class-json-database><a href=#postgres-is-a-first-class-json-database>Postgres is a first-class JSON database</a></h2><p>As we’ve said before, Postgres 16 is a first-class database for JSON (<a href=https://www.crunchydata.com/blog/better-json-in-postgres-with-postgresql-14>read Craig’s post on the evolution of JSON and JSONB</a>). Given JSON was first introduced in 9.4 and now we are on version 16, the continued rollout of JSON improvement means the capabilities are thriving in the Postgres ecosystem. We know lots of folks are using JSON in Postgres and we’re happy to see several hands on and under the hood improvements. ]]></content:encoded>
<category><![CDATA[ JSON ]]></category>
<author><![CDATA[ Christopher.Winslett@crunchydata.com (Christopher Winslett) ]]></author>
<dc:creator><![CDATA[ Christopher Winslett ]]></dc:creator>
<guid isPermalink="false">4d56da9c519d1adbbc7194a3025d00afb321b79c6ab7b55e13219c328f86984c</guid>
<pubDate>Wed, 20 Sep 2023 09:00:00 EDT</pubDate>
<dc:date>2023-09-20T13:00:00.000Z</dc:date>
<atom:updated>2023-09-20T13:00:00.000Z</atom:updated></item>
<item><title><![CDATA[ Writing Data from Postgres to MongoDB ]]></title>
<link>https://www.crunchydata.com/blog/writing-data-from-postgres-to-mongodb</link>
<description><![CDATA[ Chris has some sample code for synching data from Postgres to MongoDB using Foreign Data Wrappers and Triggers. ]]></description>
<content:encoded><![CDATA[ <p>"Last time I had data like this, I stored the data in MongoDB." --Me, last week.<p>I told this to a friend while talking through some of their data problems. As <a href=https://www.twitter.com/craigkerstiens>Craig</a> likes to say, Crunchy Data is <a href=https://changelog.com/news/just-postgres-LWm4>"Just Postgres"</a>, but we also know there are some clear cut use cases where Postgres isn’t the perfect tool for the job. Don’t get us wrong, <a href=https://www.crunchydata.com/why-postgres>Postgres</a> is an amazing database and we strongly believe what you should start with for most applications, but that doesn’t mean you can’t benefit from more than one database.<p>If you missed the announcement recently, we launched support for the <a href=https://www.crunchydata.com/blog/easy-mongo-from-your-postgres>Mongo Foreign Data Wrapper on Crunchy Bridge</a>. If you’re wondering why? Well let’s look at a practical use case.<h2 id=the-use-case><a href=#the-use-case>The Use Case</a></h2><p>For that use case, we used Postgres to back purchase data for carbon removal projects, and MongoDB to serve the public presentation of that data. The data flow looks something like below.<p><img alt=Untitled loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/85f66857-a894-46f6-b519-e3e3d4d05400/public><p>In SQL, the data for this purchase was distributed across a few tables, so the query required a multiple joins across tables called purchases, line items, credits, and invoices. But, once a purchase was completed, it never changed. Thus, we would take the output of the purchase query, package it up into JSON, and store it in MongoDB in a single document. Then, we handed a receipt token back to the end-user to fetch the purchase data later.<h2 id=why-use-two-different-databases><a href=#why-use-two-different-databases>Why use two different databases?</a></h2><p>By targeting different use cases to different databases, you can maintain some flexibility in how you work. You can target the fast, complex data to a database with more processing power.<h2 id=the-code><a href=#the-code>The Code</a></h2><p>Below, we’ll walk through using a trigger in Postgres to write the code from Postgres to MongoDB using the MongoDB FDW.<p>To get started, you’ll need a Crunchy Bridge Postgres database and a MongoDB Atlas database. I’ll reference each of them below.<p>We have some SQL data and structure files <a href=https://gist.github.com/Winslett/adf1b0f0d34980fc369de39b336517f4>here</a>. Load the structure, then the data files, and then this tutorial will show how to flow data from Postgres to MongoDB.<h2 id=load-the-postgres-data-structures><a href=#load-the-postgres-data-structures>Load the Postgres data structures</a></h2><p>Below are some tables we’ll use to provide a tangible example of the capabilities.<pre><code class=language-pgsql>CREATE TABLE public.accounts (
    id SERIAL PRIMARY KEY,
    name character varying,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL
);

CREATE TABLE public.carbon_removal_projects (
    id SERIAL PRIMARY KEY,
    name character varying,
    description text,
    originating_organization character varying,
    registry_name character varying,
    registry_id character varying,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL
);

CREATE TABLE public.purchases (
    id SERIAL PRIMARY KEY,
    account_id integer,
    tranche_id integer,
    amount_in_cents integer,
    amount_of_carbon_in_kg integer,
    public_url_token character varying,
		synced_to_mongo_at timestamp(6) without time zone,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL
);

CREATE TABLE public.tranches (
    id SERIAL PRIMARY KEY,
    carbon_removal_project_id integer,
    vintage character varying,
    cost_per_kg_in_cents integer,
    carbon_available_in_kg integer,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL
);
</code></pre><p>To piece together the tables into conceptual space, below is a diagram with explanations:<p><img alt="postgres to mongo diagram"loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/4618148d-0606-46ea-676b-b2a325ab6a00/public><ol><li>Carbon Removal Projects are created that contain information about the project, such as location, technology used, and the organization performing the removal.<li>Tranches are created that represent blocks of carbon removed from the atmosphere. So, last month they may have removed 250kg, and the prior months 250kg and 50kg respectively.<li>An account performs a purchase of a certain amount of carbon removal, which gets matched with a tranche and a carbon removal project. With this function, we can never have the amount of carbon purchased from a tranche greater than the amount of carbon removed by the tranche.</ol><h2 id=create-the-query-to-generate-json><a href=#create-the-query-to-generate-json>Create the query to generate JSON</a></h2><p>With the above schema, we’ll want to join the purchases, tranches, and carbon_removal_projects data into a reasonable JSON object. Below we use some of Postgres’ JSON builder functions to create the object:<pre><code class=language-pgsql>SELECT
   purchases.public_url_token,
   json_build_object(
     'account', json_build_object('name', accounts.name),
     'project', row_to_json(carbon_removal_projects.*),
		 'tranche', row_to_json(tranches.*),
     'carbon_offset_in_kg', purchases.amount_of_carbon_in_kg
   )
FROM purchases
	INNER JOIN accounts ON purchases.account_id = accounts.id
	INNER JOIN tranches ON purchases.tranche_id = tranches.id
	INNER JOIN carbon_removal_projects ON tranches.carbon_removal_project_id = carbon_removal_projects.id
WHERE purchases.id = 1;
</code></pre><p>Run this query against that dataset, and you’ll see it builds a JSON object. We’ll write that JSON object to MongoDB.<h2 id=create-foreign-table><a href=#create-foreign-table>Create foreign table</a></h2><p>Then, we’ll configure our Mongo foreign data table that allows us to write from Postgres to MongoDB. The following code assumes that you have already run <code>CREATE EXTENSION</code> and <code>CREATE SERVER</code> and <code>CREATE USER MAPPING</code> to configure the MongoFDW. <a href=https://www.crunchydata.com/blog/easy-mongo-from-your-postgres>For a tutorial defining connecting to Mongo, checkout the first post in this series</a>.<pre><code class=language-pgsql>-- see link above for connection information if this command fails

CREATE FOREIGN TABLE mongo_purchases
	(
		_id name,
		public_url_token character varying,
		purchase_data json, -- jsonb not allowed
	  created_at timestamp
	)
	SERVER atlas_server
	OPTIONS (database 'carbon_removal_purchases', collection 'purchases');
</code></pre><h2 id=create-a-trigger><a href=#create-a-trigger>Create a trigger</a></h2><p>On insert, we’ll package up a document and write it to the <code>mongo_purchases</code> table. We’ll use the above query to write it to the foreign table:<pre><code class=language-pgsql>CREATE OR REPLACE FUNCTION purchases_insert_trigger()
  RETURNS TRIGGER AS $$
BEGIN
  -- Insert the new row into the mongo_purchases table
  INSERT INTO mongo_purchases (public_url_token, purchase_data, created_at)
  VALUES (NEW.public_url_token,
	json_build_object(
     'account', json_build_object('name', (SELECT name FROM accounts WHERE accounts.id = NEW.account_id LIMIT 1)),
     'project', (SELECT row_to_json(carbon_removal_projects.*) FROM carbon_removal_projects INNER JOIN tranches ON tranches.carbon_removal_project_id = carbon_removal_projects.id WHERE tranches.id = NEW.tranche_id LIMIT 1),
		 'tranche', (SELECT row_to_json(tranches.*) FROM tranches WHERE tranches.id = NEW.tranche_id LIMIT 1),
     'carbon_offset_in_kg', NEW.amount_of_carbon_in_kg
  ),
	NEW.created_at);

	UPDATE purchases SET synced_to_mongo_at = now() WHERE id = NEW.id;

  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER purchases_insert
  AFTER INSERT ON purchases
  FOR EACH ROW
  EXECUTE FUNCTION purchases_insert_trigger();
</code></pre><p>This function builds the JSON object based on the newly inserted purchases row. Then, it writes that object to the <code>mongo_purchases</code> table, which actually resides in the Mongo Atlas database.<p>At the end, it sets a timestamp for synced to Mongo. This allows us to sync all existing records to Mongo with the following instructions!<h2 id=sync-existing-data><a href=#sync-existing-data>Sync existing data</a></h2><p>We have a way to sync all new purchases rows, but what do we need to do for the existing rows? Be sure to <a href=https://gist.github.com/Winslett/adf1b0f0d34980fc369de39b336517f4#file-data-sql-L15>load the the data.sql file from above</a>, it’ll give you some real data to play with. Run the following to insert existing rows:<pre><code class=language-pgsql>WITH unsynced_purchases AS ( -- #1
		UPDATE purchases
    SET synced_to_mongo_at = now()
    WHERE synced_to_mongo_at &#60 updated_at
			OR synced_to_mongo_at IS NULL
		RETURNING id
)

INSERT INTO mongo_purchases (public_url_token, purchase_data, created_at)  -- #2
	SELECT  -- #3
	   purchases.public_url_token,
	   json_build_object(   -- #4
	     'account', json_build_object('name', accounts.name),
	     'project', row_to_json(carbon_removal_projects.*),
			 'tranche', row_to_json(tranches.*),
	     'carbon_offset_in_kg', purchases.amount_of_carbon_in_kg
	   ),
		 purchases.created_at
	FROM purchases
		INNER JOIN unsynced_purchases ON purchases.id = unsynced_purchases.id -- #4
		INNER JOIN accounts ON purchases.account_id = accounts.id
		INNER JOIN tranches ON purchases.tranche_id = tranches.id
		INNER JOIN carbon_removal_projects ON tranches.carbon_removal_project_id = carbon_removal_projects.id;
</code></pre><p>What is this magic? I’m quite proud of this query, so let me unwind it for you based on the numbers notated above:<ol><li><code>UPDATE</code> purchases that have not synced since being updated, and return the ids<li>Define an <code>INSERT</code> statement for the values returned by<li>Use the <code>SELECT</code> statement that we built previously<li>Filter by the <code>unsynched_purchases</code> as defined in the CTE</ol><h2 id=insert-data-and-watch-it-flow><a href=#insert-data-and-watch-it-flow>Insert data and watch it flow</a></h2><p>Run the following in your Mongo, and you’ll see the latest purchase data:<pre><code class=language-mongodb>db.purchases.find().sort({created_at: -1}).limit(1)
</code></pre><p>If you followed the tutorial above, then you should see the same in Postgres, when you run. This proves that our data synced correctly. If you run any count queries, you’ll see that also matches.<pre><code class=language-pgsql>SELECT * FROM purchases ORDER BY created_at DESC LIMIT 1;
</code></pre><p>Now, to test the insert, run the following:<pre><code class=language-pgsql>INSERT INTO purchases (
	account_id,
	tranche_id,
	amount_in_cents,
	amount_of_carbon_in_kg,
	public_url_token,
	created_at,
	updated_at
) VALUES (
	1,
	1,
	50,
	0.2,
	gen_random_uuid(),
	now(),
	now()
) returning *;
</code></pre><p>Then, when querying MongoDB, you’ll see the new document:<pre><code class=language-mongodb>db.purchases.find().sort({created_at: -1}).limit(1)
</code></pre><h3 id=summary><a href=#summary>Summary</a></h3><ul><li>Using different databases can give you flexibility and agility with your data strategy.<li>In a multi-database environment, you can use a relational database and a NoSQL one, each for their specific use case<li>With a tool like the Postgres MongoDB Foreign Data Wrapper, you can use easily sync data from Postgres to MongoDB<li>Combining the Foreign Data Wrapper with JSON creation and Postgres Triggers will allow you to keep both data stores in sync</ul> ]]></content:encoded>
<category><![CDATA[ JSON ]]></category>
<category><![CDATA[ Crunchy Bridge ]]></category>
<author><![CDATA[ Christopher.Winslett@crunchydata.com (Christopher Winslett) ]]></author>
<dc:creator><![CDATA[ Christopher Winslett ]]></dc:creator>
<guid isPermalink="false">4dfa298871aa08128ea00e65596d0c5811265c0222df2297bb25e21b923326fc</guid>
<pubDate>Mon, 17 Apr 2023 09:00:00 EDT</pubDate>
<dc:date>2023-04-17T13:00:00.000Z</dc:date>
<atom:updated>2023-04-17T13:00:00.000Z</atom:updated></item>
<item><title><![CDATA[ Easy Mongo from your Postgres ]]></title>
<link>https://www.crunchydata.com/blog/easy-mongo-from-your-postgres</link>
<description><![CDATA[ To continue to provide developers with the best Postgres experience, Crunchy Bridge now offers easy connectivity between managed Postgres and MongoDB. ]]></description>
<content:encoded><![CDATA[ <p>MongoDB got an early start as a developer friendly database. Mongo was trivial to get started with and has some good tooling for scaling. Since you didn't have to create a schema, you were off and running in no time. Don't get us wrong, we do firmly believe a well designed schema is important. If you're not managing your schema it's managing you. Because of these capabilities, MongoDB has been used by teams for:<ul><li>the typical, primary application database<li>supporting database for large, sharded JSON datasets<li>rolling log using MongoDB’s capped collections<li>rapidly updating JSON documents using MongoDB’s atomic updates</ul><p>In the open-source database world we’re seeing a lot of customers working with both Mongo and Postgres. We’re excited to announce today that for all new <a href=https://www.crunchydata.com/products/crunchy-bridge>Crunchy Bridge</a> deployments, you will have access to the MongoDB FDW extension. Said another way, it is a foreign data wrapper that will represent data in MongoDB via Postgres.<p>You can run the following query in MongoDB:<pre><code class=language-mongodb>db.purchases.find({ _id: ObjectId('6408fab340abb74a9d616083') })
</code></pre><p>And it will return the object in MongoDB. By connecting your Postgres to Mongo via the FDW, you can then run the following in your Postgres:<pre><code class=language-pgsql>SELECT * FROM mongo_purchases WHERE _id = '6408fab340abb74a9d616083';
</code></pre><p>It doesn't have to be Mongo vs. Postgres. The two can be used in conjunction, in fact a <a href=https://www.crunchydata.com/case-studies/wwt>number of our customers are adopting both</a>. If you are using both, you shouldn't have to go through complicated pipelines and ETL to allow them to talk to each other, they can be friends. Let's dig in at connecting Crunchy Bridge, our fully managed Postgres-as-a-service and MongoDB Atlas, a fully managed MongoDB-as-a-service. One multi-cloud DBaaS meets another.<h2 id=getting-started><a href=#getting-started>Getting Started</a></h2><p>To get started, do the following:<ol><li><p>Signup &#38 create a <a href=https://crunchybridge.com/start>Crunchy Bridge</a> database, we have <a href=https://www.crunchydata.com/blog/making-postgres-more-afforable-new-low-cost-plans-on-crunchy-bridge>low cost</a> plans available.<li><p>Connect to your MongoDB Atlas database, where we will need to get a few variables for configurations. First, we’ll need the replica set name, and we can get that by running as a user with MongoDB Atlas Admin privileges:<pre><code class=language-mongodb>rs.status().set // = `replica-set` value used later in tutorial
</code></pre><p>Then, run the following to get a host and port that we will use later. It will be in the format <code>&#60mongo-host>:&#60mongo-port></code>, which we’ll use below:<pre><code class=language-mongodb>rs.status().members.forEach(function (m) {
	print(m.name)
})
</code></pre><p>Choose any of the hosts listed — it does not matter. The MongoDB driver built into MongoFDW will automatically connect to the primary during writes.<li><p>Connect to your Crunchy Bridge database as the <code>postgres</code> user, and run the following:<pre><code class=language-pgsql>CREATE EXTENSION mongo_fdw;

-- create the mongodb server
CREATE SERVER atlas_server
	FOREIGN DATA WRAPPER mongo_fdw
	OPTIONS (
		address '&#60mongo-host>', -- from step #2
		port '&#60mongo-port>', -- from step #2
		ssl 'true',
		replica_set '&#60replica-set>', -- from step #2
		authentication_database 'admin',
		weak_cert_validation 'true'
	);

-- create user
CREATE USER MAPPING FOR postgres
	SERVER atlas_server
	OPTIONS (username '&#60your-mongodb-username>', password '&#60your-mongodb-password>');

-- connect tables
CREATE FOREIGN TABLE mongo_superheroes
	(
		_id name,
		name text,
		data json, -- jsonb not allowed
	  created_at timestamp
	)
	SERVER atlas_server
	OPTIONS (database 'superhero-database', collection 'superheroes');
</code></pre></ol><p>If you run into errors, double check the host, port, replica set, username, and password are correct. If those are correct, check that your MongoDB Atlas firewall permits connecting from the Postgres host.<h2 id=working-with-data><a href=#working-with-data>Working with data</a></h2><p>Once you have configured MongoFDW, you’ll be able to work with the MongoDB data. Inserts are as native as you can imagine:<pre><code class=language-pgsql>INSERT INTO mongo_superheroes (name, data, created_at) VALUES ('batman', '{"superpowers": ["moral compass", "innovative"]}', now());
</code></pre><p>Then, we can query data from that table:<pre><code class=language-pgsql>SELECT * FROM mongo_superheroes;
</code></pre><p>Which returns something like the following.<pre><code class=language-text>           _id            |  name  |                         data                          |       created_at
--------------------------+--------+-------------------------------------------------------+-------------------------
 640a40334d7edb1a9921b2f2 | batman | { "superpowers" : [ "moral compass", "innovative" ] } | 2023-03-30 20:23:14.974
</code></pre><p>Now, you can log into your MongoDB, and run the following from that <code>superhero-database</code>:<pre><code class=language-mongodb>db.superheroes.find()
</code></pre><p>which returns:<pre><code class=language-mongodb>[
  {
    _id: ObjectId("640a40334d7edb1a9921b2f2"),
    name: 'batman',
    data: { superpowers: [ 'moral compass', 'innovative' ] },
    created_at: ISODate("2023-03-30T20:23:14.974Z")
  }
]
</code></pre><h2 id=go-forth-and-conquer><a href=#go-forth-and-conquer>Go forth and conquer</a></h2><p>So there you have it - a developer friendly way to merge MongoDB and Postgres into a truly hybrid data structure. We can see it being used for ETL processes that sync data from a MongoDB application to a Postgres warehouse for analysis. We can also see where you might want to write data to MongoDB directly from Postgres. We are looking forward to what you all end up doing with this new foreign data wrapper. ]]></content:encoded>
<category><![CDATA[ JSON ]]></category>
<category><![CDATA[ Crunchy Bridge ]]></category>
<author><![CDATA[ Christopher.Winslett@crunchydata.com (Christopher Winslett) ]]></author>
<dc:creator><![CDATA[ Christopher Winslett ]]></dc:creator>
<guid isPermalink="false">75c1b02d18c11997834c2bd4c9c2534ba4d2e58a1a27bfb3650c3871bc6addcf</guid>
<pubDate>Fri, 31 Mar 2023 09:00:00 EDT</pubDate>
<dc:date>2023-03-31T13:00:00.000Z</dc:date>
<atom:updated>2023-03-31T13:00:00.000Z</atom:updated></item>
<item><title><![CDATA[ Postgres support for JSON is 10 years old! ]]></title>
<link>https://www.crunchydata.com/blog/state-of-json-in-postgres-2022</link>
<description><![CDATA[ JSON in Postgres is now 10 years old. Chris recently published a JSON tutorial and offers some thoughts on the past and future of JSON, and what developers need to be aware of. ]]></description>
<content:encoded><![CDATA[ <p>JSON is everywhere, even in your SQL database. Since Postgres added JSON support 10 years ago, we've seen widespread adoption. When JSON support was first released in 2012 the implementation was a fairly flat representation of the JSON data type. It had limited querying and indexing capabilities. <a href=https://twitter.com/craigkerstiens>Craig</a>, who was active in early JSON discussions with Postgres contributors, admits <a href=https://www.crunchydata.com/blog/better-json-in-postgres-with-postgresql-14>"Postgres cheated"</a> with these early implementations. As JSONB has been introduced and adopted, I think you can see that JSON use in Postgres is now one of its biggest strengths.<h3 id=crunchys-json-tutorial><a href=#crunchys-json-tutorial>Crunchy's JSON Tutorial</a></h3><p>If you're new to JSON in Postgres or just want to play around, jump over to our <a href=https://www.crunchydata.com/developers/playground/basics-of-jsonb>JSON + Postgres tutorial</a> and give it a spin.<h3 id=jsonb><a href=#jsonb>JSONB</a></h3><p>Today, when creating a column for JSON data in Postgres, nearly all use-cases call for the JSONB data type. JSONB stores data in a binary (B), binary form. JSONB has been fundamental in improving the Postgres-JSON experience. With it, Postgres has functions for modifying, extracting, and re-forming JSON data.<p>I'm going to go out on a limb and say that Postgres has as much support for JSON as any other database vendor. Yet, accessing that JSONB functionality in Postgres is not as straight forward. With most scenarios, there is more than one method to get the intended result. JSONB queries are a combination of traditional SQL syntax, JSONB, and JSONPath queries. Three people may get the same result, yet they use completely different set of functions.<p>For instance: assuming querying a nested set of objects, a programmer can either<ol><li>Turn those into a set of records using jsonb_to_recordset, then filter the records based on conditionals, then rejoin to JSON using to_jsonb<li>Use the JSONPath to query and compare the objects<li>Use JSON operators to extractand compare.</ol><p>So, how can you use Postgres with JSONB while keeping your sanity?<h2 id=json-storage><a href=#json-storage>JSON Storage</a></h2><p><strong>The good news</strong>: With Postgres, JSONB documents can be as large as 255MB in size. JSONB documents can store any structure and store different structures on different records. Four layer deep arrays and objects? No problem!<p><strong>The practical news</strong>: Read only JSONB blobs can be a wild west of size and format. But, if you want to query JSONB documents, they should not be that large. Consistent, simple, well-defined JSON schema rewards a developer with performance and readable queries.<h2 id=json-queries--filtering><a href=#json-queries--filtering>JSON Queries &#38 Filtering</a></h2><p><strong>The good news</strong>: Postgres functions can extract values from inside of a JSON blob. These functions access and transform objects, arrays, and values. Postgres has JSONPath conditions for extracting with precision.<p><strong>The practical news</strong>: Query statements filtering values for a single field is easy to read. Single conditionals mean the mental buffer does not overflow. Additional conditionals increase the verbosity of the functions, which become hard-to-read statements.<p>Once values are extracted from JSONB, the values are returned as strings. Being strongly-typed, Postgres will need those values cast for comparison. Thus, JSONB conditionals are encased in type casting statements.<p>JSON filtering in Postgres can start to feel complex or unreadable. But, it's on-par with other databases in the space. An expressive query is required for an equally expressive JSON document. Keep documents simple, and the queries will follow.<h2 id=json-query-performance><a href=#json-query-performance>JSON Query Performance</a></h2><p><strong>The good news</strong>: Postgres has indexes for all data structures found within JSONB.<p><strong>The practical news</strong>: Simple indexes perform reliably. <code>BTREE</code> are the most predcitable and resource effective indexes. When using <code>BTREE</code>, index creation integrates the JSON extraction and transformation. Postgres optimizes for these explicit values, and the human query builder understands the indexes. <code>BTREE</code> will perform as you would expect for any index on Postgres.<p>For more expansive queries that may query any of the object's keys, <code>GIN</code> indexes are an option. <code>GIN</code> indexes can be powerful, yet they should not be thrown on every JSONB field (this is never a good performance strategy for any type of data, but particularly <code>GIN</code> indexes). <code>GIN</code> indexes have caveats for which operators work with different index settings. When using <code>GIN</code> indexes, set aside extensive time for testing each time a query needs to use it.<p><strong>How do Postgres' indexes compare to other JSON functionality?</strong><p>Key-value store databases limit the number of indexes, thus focusing performance on a limited number of queries, and preventing index sprawl. This strategy prevents database users from sabotaging their own experience. Search specific indexes throw computer resources it, thus you'll use CPU + RAM + money. General application databases trust the programmer to use the provided toolset. Postgres is in this category -- it is up to the user to use the tool wisely.<h2 id=json-modifications--json-upserts><a href=#json-modifications--json-upserts>JSON Modifications &#38 JSON Upserts</a></h2><p><strong>The good news</strong>: With Postgres, a single update statement can update or upsert values in a JSONB document. Postgres provides the ability to do in-place updates with JSONB.<p><strong>The practical news</strong>: With Postgres, inplace-updates are similar to any update. The query contains the method of building a new document. As we said about queries, complex JSON documents require complex in-place update queries. Still, it may be worth it. Inplace-updates avoid the network round-trip, which rewards performance.<h2 id=using-postgres-with-json-practically><a href=#using-postgres-with-json-practically>Using Postgres with JSON Practically</a></h2><p>Today, anything is possible with JSON with Postgres. Complex situations can create SQL spaghetti. A developer choosing complex JSON will pay the price for it on code-complexity. Yet, I would argue this is the case with any other JSON database.<p>The words of advice for Postgres + JSON IN 2022: if you feel like you are struggling to make JSON work with Postgres, consider using a table.<p>Want to try it out? Checkout the <a href=https://www.crunchydata.com/developers/playground/basics-of-jsonb>JSON tutorial in the Crunchy Data Postgres playground</a>. ]]></content:encoded>
<category><![CDATA[ Postgres Tutorials ]]></category>
<category><![CDATA[ JSON ]]></category>
<author><![CDATA[ Christopher.Winslett@crunchydata.com (Christopher Winslett) ]]></author>
<dc:creator><![CDATA[ Christopher Winslett ]]></dc:creator>
<guid isPermalink="false">c257254a4ae542ab4deef0dd5143cbb3e8f841719c4472f2c0653004df58836d</guid>
<pubDate>Fri, 09 Dec 2022 10:00:00 EST</pubDate>
<dc:date>2022-12-09T15:00:00.000Z</dc:date>
<atom:updated>2022-12-09T15:00:00.000Z</atom:updated></item>
<item><title><![CDATA[ Cut Out the Middle Tier: Generating JSON Directly from Postgres ]]></title>
<link>https://www.crunchydata.com/blog/generating-json-directly-from-postgres</link>
<description><![CDATA[ PostgreSQL has built-in JSON generators that can be used to create structured JSON output right in the database, upping performance and radically simplifying web tiers. ]]></description>
<content:encoded><![CDATA[ <p>Too often, web tiers are full of boilerplate that does nothing except convert a result set into JSON. A middle tier could be as simple as a function call that returns <a href=/blog/using-postgresql-for-json-storage>JSON</a>. All we need is an easy way to convert result sets into JSON in the database.<p>PostgreSQL has <a href=https://www.postgresql.org/docs/current/functions-json.html>built-in JSON generators</a> that can be used to create structured JSON output right in the database, upping performance and radically simplifying web tiers.<p>Fortunately, PostgreSQL <strong>has such functions</strong>, that run right next to the data, for better performance and lower bandwidth usage.<h2 id=some-example-data><a href=#some-example-data>Some example data</a></h2><p>To try out these examples, load this tiny database:<pre><code class=language-pgsql>CREATE TABLE employees (
  employee_id serial primary key,
  department_id integer references departments(department_id),
  name text,
  start_date date,
  fingers integer,
  geom geometry(point, 4326)
  );

CREATE TABLE departments (
  department_id bigint primary key,
  name text
  );

INSERT INTO departments
 (department_id, name)
VALUES
 (1, 'spatial'),
 (2, 'cloud');

INSERT INTO employees
 (department_id, name, start_date, fingers, geom)
VALUES
 (1, 'Paul',   '2018/09/02', 10, 'POINT(-123.32977 48.40732)'),
 (1, 'Martin', '2019/09/02',  9, 'POINT(-123.32977 48.40732)'),
 (2, 'Craig',  '2019/11/01', 10, 'POINT(-122.33207 47.60621)'),
 (2, 'Dan',    '2020/10/01',  8, 'POINT(-122.33207 47.60621)');
</code></pre><p>Four employees, arranged into two departments, with some detail information about each employee.<h2 id=easy-json-using-row_to_json><a href=#easy-json-using-row_to_json>Easy JSON using row_to_json</a></h2><p>The simplest JSON generator is <code>row_to_json()</code> which takes in a tuple value and returns the equivalent JSON dictionary.<pre><code class=language-pgsql>SELECT row_to_json(employees)
FROM employees
WHERE employee_id = 1;
</code></pre><p>The resulting JSON uses the column names for keys, so you get a neat dictionary.<pre><code class=language-pgsql>{
  "employee_id": 1,
  "department_id": 1,
  "name": "Paul",
  "start_date": "2018-09-02",
  "fingers": 10,
  "geom": {
    "type": "Point",
    "coordinates": [
      -123.329773,
      48.407326
    ]
  }
}
</code></pre><p>And look what happens to the geometry column! Because <a href=/blog/topic/spatial>PostGIS</a> includes a cast from geometry to JSON, the geometry column is automatically mapped into <a href=https://geojson.org/>GeoJSON</a> in the conversion. This is a useful trick with any custom type: define a cast to JSON and you automatically integrate with the native PostgreSQL JSON generators.<h2 id=full-result-sets-using-json_agg><a href=#full-result-sets-using-json_agg>Full result sets using json_agg</a></h2><p>Turning a single row into a dictionary is fine for basic record access, but queries frequently require multiple rows to be converted.<p>Fortunately, there's an <a href=https://www.postgresql.org/docs/10/functions-aggregate.html>aggregate function</a> for that, <code>json_agg</code>, which carries out the JSON conversion and converts the multiple results into a JSON list.<pre><code class=language-pgsql>SELECT json_agg(e)
FROM (
    SELECT employee_id, name
    FROM employees
   WHERE department_id = 1
    ) e;
</code></pre><p>Note that in order to strip down the data in the record, we use a subquery to make a narrower input to <code>json_agg</code>.<pre><code class=language-pgsql>[
  {
    "employee_id": 1,
    "name": "Paul"
  },
  {
    "employee_id": 2,
    "name": "Martin"
  }
]
</code></pre><h2 id=nested-results-using-subqueries><a href=#nested-results-using-subqueries>Nested results using subqueries</a></h2><p>So far, all this is pretty easy to replicate in middleware, but things get more interesting when you start dumping structured results.<p>Using aggregation, and converting the results to JSON in stages, it's possible to build up nested JSON outputs that reflect table relationships.<pre><code class=language-pgsql>WITH
-- strip down employees table
employees AS (
  SELECT department_id, name, start_date
  FROM employees
),
-- join to departments table and aggregate
departments AS (
  SELECT d.name AS department_name,
         json_agg(e) AS employees
 FROM departments d
  JOIN employees e
  USING (department_id)
  GROUP BY d.name
)
-- output as one json list
SELECT json_agg(departments)
FROM departments;
</code></pre><p>And the result has one entry for each department, which each contains its two employees.<pre><code class=language-json>[
	{
		"department_name": "cloud",
		"employees": [
			{
				"department_id": 2,
				"name": "Craig",
				"start_date": "2019-11-01"
			},
			{
				"department_id": 2,
				"name": "Dan",
				"start_date": "2020-10-01"
			}
		]
	},
	{
		"department_name": "spatial",
		"employees": [
			{
				"department_id": 1,
				"name": "Paul",
				"start_date": "2018-09-02"
			},
			{
				"department_id": 1,
				"name": "Martin",
				"start_date": "2019-09-02"
			}
		]
	}
]
</code></pre><p>If you would prefer your output to be an associative array instead of a list, replace the final <code>json_agg</code> with <code>json_object_agg</code>.<h2 id=all-your-tables-in-json><a href=#all-your-tables-in-json>All your tables in JSON</a></h2><p>Ever wanted to quickly extract a definition of your table structures from the database? With the JSON formatters and the PostgreSQL system tables, all that info is right at hand.<pre><code class=language-pgsql>WITH rows AS (
  SELECT c.relname, a.attname, a.attnotnull, a.attnum, t.typname
  FROM pg_class c
  JOIN pg_attribute a
    ON c.oid = a.attrelid and a.attnum >= 0
 JOIN pg_type t
    ON t.oid = a.atttypid
  JOIN pg_namespace n
    ON c.relnamespace = n.oid
 WHERE n.nspname = 'public'
    AND c.relkind = 'r'
),
agg AS (
  SELECT rows.relname, json_agg(rows ORDER BY attnum) AS attrs
  FROM rows
 GROUP BY rows.relname
)
SELECT json_object_agg(agg.relname, agg.attrs)
FROM agg;
</code></pre><p>Here's the entry for the "departments" table.<pre><code class=language-json>{
  "departments": [
    {
      "relname": "departments",
      "attname": "department_id",
      "attnotnull": true,
      "attnum": 1,
      "typname": "int8"
    },
    {
      "relname": "departments",
      "attname": "name",
      "attnotnull": false,
      "attnum": 2,
      "typname": "text"
    }
  ],
  ...
}
</code></pre><h2 id=conclusion><a href=#conclusion>Conclusion</a></h2><ul><li>PostgreSQL JSON emitters can turn any result set into JSON right in the database<li>Web tiers can be vastly simplified by pushing JSON creation further down the stack<li>Custom types can emit custom JSON if a cast to json is defined on them</ul> ]]></content:encoded>
<category><![CDATA[ JSON ]]></category>
<author><![CDATA[ Paul.Ramsey@crunchydata.com (Paul Ramsey) ]]></author>
<dc:creator><![CDATA[ Paul Ramsey ]]></dc:creator>
<guid isPermalink="false">https://blog.crunchydata.com/blog/generating-json-directly-from-postgres</guid>
<pubDate>Wed, 14 Jul 2021 05:00:00 EDT</pubDate>
<dc:date>2021-07-14T09:00:00.000Z</dc:date>
<atom:updated>2021-07-14T09:00:00.000Z</atom:updated></item></channel></rss>