<?xml version="1.0" encoding="UTF-8" ?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" version="2.0"><channel><title>Ben Blattberg | CrunchyData Blog</title>
<atom:link href="https://www.crunchydata.com/blog/author/ben-blattberg/rss.xml" rel="self" type="application/rss+xml" />
<link>https://www.crunchydata.com/blog/author/ben-blattberg</link>
<image><url>https://www.crunchydata.com/build/_assets/ben-blattberg.png-QKEJ56ID.webp</url>
<title>Ben Blattberg | CrunchyData Blog</title>
<link>https://www.crunchydata.com/blog/author/ben-blattberg</link>
<width>420</width>
<height>420</height></image>
<description>PostgreSQL experts from Crunchy Data share advice, performance tips, and guides on successfully running PostgreSQL and Kubernetes solutions</description>
<language>en-us</language>
<pubDate>Thu, 14 Dec 2023 08:00:00 EST</pubDate>
<dc:date>2023-12-14T13:00:00.000Z</dc:date>
<dc:language>en-us</dc:language>
<sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<item><title><![CDATA[ pgAdmin for All of Your Postgres Kubernetes Clusters ]]></title>
<link>https://www.crunchydata.com/blog/cpk-5-5-a-new-pgadmin-experience</link>
<description><![CDATA[ We just released version 5.5 of Crunchy Postgres for Kubernetes and have a new pgAdmin experience. Now you can run all of your Postgres clusters in one interface across your Kubernetes fleet. We have a really cool way to manage this and it includes automatic new cluster detection. ]]></description>
<content:encoded><![CDATA[ <p>We recently announced the latest update of <a href=https://access.crunchydata.com/documentation/postgres-operator/latest/releases/5.5.x>Crunchy Postgres for Kubernetes 5.5</a>. In this version 5.5 update, we would like to highlight a key feature: the introduction of a new pgAdmin API.<p>The notable changes in this feature include:<ul><li>The ability to manage all Postgres clusters through a single interface<li>Automatic cluster detection<li>A new custom resource file for pgAdmin</ul><p>Read on to explore the full range of functionalities and learn how to set up and utilize this new pgAdmin experience. Additionally, make sure to consult our comprehensive documentation, which provides practical how-tos, best practices, and detailed step-by-step guides to maximize your <a href=https://www.crunchydata.com/products/crunchy-postgresql-for-kubernetes>Crunchy Postgres for Kubernetes</a> experience.<h2 id=updated-pgadmin-deployment><a href=#updated-pgadmin-deployment>Updated pgAdmin deployment</a></h2><p>If you are already familiar with our <a href=https://www.crunchydata.com/blog/seamless-pgadmin-4-deployments-using-pgo-v5.1>previous method of providing pgAdmin</a> and prefer not to make changes, don't worry! You don't have to. We have not made any modifications to how you request a pgAdmin instance attached to a single PostgresCluster. You can still add it to your PostgresCluster definition like this:<pre><code class=language-yaml>kind: PostgresCluster
spec:
  userInterface:
    pgAdmin: ...
</code></pre><p>The most noticeable change is the creation of a new API and a new <code>Custom Resource</code> specifically for pgAdmin. This pgAdmin instance will be separate and independent from any PostgresCluster. Here’s a sample of the new PGAdmin Custom Resource file.<pre><code class=language-yaml>apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PGAdmin
metadata:
  name: rhino
spec:
  dataVolumeClaimSpec:
    accessModes:
      - 'ReadWriteOnce'
    resources:
      requests:
        storage: 1Gi
  serverGroups:
    - name: supply
      postgresClusterSelector: {}
</code></pre><p>One advantage of this new method is that the pgAdmin is not limited to any specific PostgresCluster. In fact, this new approach allows you to easily create a single pgAdmin instance that can manage multiple PostgresClusters within a namespace. We refer to this new approach as "namespace-scoped" because it enables the management of several clusters in a namespace.<p>In addition to this change, we have also released a new pgAdmin image for this new implementation, which is compatible with the most recent versions of Postgres.<p>We believe you will appreciate these updates, and we have prepared a walkthrough to demonstrate the flexibility and power of this new experience.<h2 id=walkthrough><a href=#walkthrough>Walkthrough</a></h2><p>Let's set up a new namespace-scoped pgAdmin instance.<h3 id=checking-the-crd-existence><a href=#checking-the-crd-existence>Checking the CRD Existence</a></h3><p>Before we can create a pgAdmin instance from the Custom Resource, we need to ensure that we are running PGO v5.5 with the new <code>pgadmins</code> Custom Resource installed. Let's verify the existence of the <code>pgadmins</code> resource:<pre><code class=language-bash>-> kubectl get crd pgadmins.postgres-operator.crunchydata.com
NAME                                                 CREATED AT
pgadmins.postgres-operator.crunchydata.com           ...

</code></pre><p>If we see the <code>pgadmins</code> Custom Resource, we are ready to proceed with creating a pgAdmin instance by defining a YAML and sending it to the Kubernetes cluster.<h3 id=creating-a-pgadmin-instance><a href=#creating-a-pgadmin-instance>Creating a pgAdmin Instance</a></h3><p>Let's define a basic <code>pgadmins</code> instance:<pre><code class=language-yaml>apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PGAdmin
metadata:
  name: rhino
spec:
  dataVolumeClaimSpec:
    accessModes:
      - 'ReadWriteOnce'
    resources:
      requests:
        storage: 1Gi
  serverGroups:
    - name: demand
      postgresClusterSelector:
        matchLabels:
          owner: logistics
</code></pre><p>In this example, we define a <code>pgadmin</code> named "rhino" with 1Gi of storage. 1Gi of storage is our usual amount in our examples but can be adjusted for your specific needs. We will discuss the <code>serverGroups</code> section in a moment, but first, let's create this pgAdmin instance and access it.<p>To create the instance, we can use the <code>kubectl create</code> command with this YAML, or any other command/infrastructure you typically use to create Kubernetes instances.<h3 id=accessing-the-pgadmin><a href=#accessing-the-pgadmin>Accessing the pgAdmin</a></h3><p>To access the pgAdmin instance, we need to access the port and retrieve the user credentials.<h4 id=exposing-the-port><a href=#exposing-the-port>Exposing the Port</a></h4><p>The pod exposes pgAdmin on port 5050. To access it, we can create a Service and port-forward to that Service.<p>If you already have a Service or want to port-forward directly to the Pod, you can do that as well.<p>To create a Service, we can use <a href=https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#expose>kubectl expose</a>, which conveniently creates a Service for us if we provide the name of a Pod. First, we need to obtain the name of the Pod by selecting it using a label. Assuming we want to connect to the <code>rhino</code> pgAdmin example mentioned earlier, we can run the following commands:<pre><code class=language-bash># Select the pod using the `postgres-operator.crunchydata.com/pgadmin=rhino` label
# and save the name to the variable PGADMIN_POD for easier reuse.
PGADMIN_POD=$(kubectl get pod -n postgres-operator -l postgres-operator.crunchydata.com/pgadmin=rhino -o name)

# Create a Service with the expose command.
kubectl expose -n postgres-operator ${PGADMIN_POD} --name rhino-pgadmin

# Port-forward to the new service
kubectl port-forward -n postgres-operator svc/rhino-pgadmin 5050:5050
</code></pre><p>Once we have completed these steps, we can open our browser and navigate to <code>http://localhost:5050</code>, which will display the pgAdmin login screen. Now we need to obtain our credentials.<h4 id=getting-the-credentials><a href=#getting-the-credentials>Getting the Credentials</a></h4><p>In this version, pgAdmin requires an initial administrator user. As part of the deployment, the Operator sets up an administrator user and generates a password, which we can use.<p>The username we define is hardcoded to prevent issues with username redefinition. The admin username is <code>admin@&#60pgAdmin_name>.&#60pgAdmin_namespace>.svc</code>.<p>The password for that user is stored in the Secret with the label <code>postgres-operator.crunchydata.com/pgadmin=&#60pgAdmin_name></code>.<p>For example, in our <code>rhino</code> example (assuming the pgadmin was created in the <code>postgres-operator</code> namespace), the user would be <code>admin@rhino.postgres-operator.svc</code>. To retrieve the password, we can run the following command:<pre><code class=language-bash>kubectl get secret -n postgres-operator -l postgres-operator.crunchydata.com/pgadmin=rhino -o jsonpath='{.items[0].data.password}' | base64 -d
</code></pre><p>This command will output the password.<p>With the port exposed, and with the username and password, we can now log in as the administrator and manage other users.<h3 id=customizing-your-pgadmin><a href=#customizing-your-pgadmin>Customizing Your pgAdmin</a></h3><h4 id=configuring-external-user-management><a href=#configuring-external-user-management>Configuring External User Management</a></h4><p>But what if we don't want to manage users? Similar to the previous method of deploying pgAdmin, this new namespace-scoped pgAdmin accepts custom configurations, including LDAP configuration.<p>For instance, if we want to change the configuration to disable Gravatar images for users, we can set the following in the <code>pgadmin</code> spec:<pre><code class=language-yaml>spec:
  config:
    settings:
      SHOW_GRAVATAR_IMAGE: False
</code></pre><p>Additionally, we can mount files to <code>/etc/pgadmin/conf.d</code> inside the pgAdmin container using <a href=https://kubernetes.io/docs/concepts/storage/projected-volumes/>projected volumes</a>. For example, the following spec mounts the <code>useful.txt</code> file from the <code>mysecret</code> Secret to <code>/etc/pgadmin/conf.d/useful.txt</code>:<pre><code class=language-yaml>spec:
  config:
    files:
      - secret:
          name: mysecret
          items:
            - key: useful.txt
</code></pre><p>For LDAP configuration, we need to set certain configuration options through the <code>spec.config.settings</code> section of the YAML. If we have an LDAP bind password, we would also need to mount a Secret with that password using the special field <code>spec.config.ldapBindPassword</code>. (We use this special field to remind users not to include passwords directly in the YAML.)<p>To illustrate, if we wanted to add LDAP configuration and had a Secret named <code>ldap-pass</code> with the password stored in a field called <code>my-password</code>, our <code>pgadmin</code> spec might look like this:<pre><code class=language-yaml>spec:
  config:
    settings:
      AUTHENTICATION_SOURCES: ['ldap']
      LDAP_AUTO_CREATE_USER: True
      LDAP_SERVER_URI: ldaps://my.ds.example.com
    ldapBindPassword:
      name: ldap-pass
      key: my-password
</code></pre><h4 id=automated-server-discovery><a href=#automated-server-discovery>Automated Server Discovery</a></h4><p>In our previous architecture, each pgAdmin instance was tied to a PostgresCluster. With the new approach, a pgAdmin instance can automatically register multiple PostgresClusters. How does this work?<p>This is where the <code>serverGroups</code> section of the PgAdmin spec, mentioned earlier, comes into play. In this section, we can define multiple selector groups that the Operator uses to select specific PostgresClusters based on labels.<p>The Operator accomplishes this in a Kubernetes-native manner: it lists all the PostgresClusters in the namespace and then filters them based on the labels or selectors we have defined.<p>For example, if we had a pgAdmin set up with the spec mentioned above, this is the section that defines our filtering:<pre><code class=language-yaml>spec:
  serverGroups:
    - name: demand
      postgresClusterSelector:
        matchLabels:
          owner: logistics
</code></pre><p>This spec instructs the Operator to "list the PostgresClusters in the namespace, but filter that list for clusters with the label <code>owner: logistics</code>."<p>The Operator then creates that list, which the pgAdmin instance registers. In fact, the pgAdmin process registers these Postgres instances as "shared servers" owned by the initial administrator user set up by pgAdmin. A "shared server" is owned by one user (the administrator) but can be viewed in the pgAdmin list by any user.<p>Please note that for namespace-scoped pgAdmin deployments, there is no automatic synchronization between pgAdmin and Postgres users. Even if you can sign in to pgAdmin and see PostgresClusters, you will still need a valid Postgres user and credentials to access the Postgres database.<p>It is possible to omit the <code>serverGroups</code> field or to have no PostgresClusters discovered with the given selectors. Due to how pgAdmin currently functions, when zero servers are discovered, the initial administrator user will be unable to manually add new ServerGroups or Servers.<p>But what if we wanted to register PostgresClusters with different labels? We can handle this scenario by adding multiple <code>serverGroups</code>, which will create separate "Server Groups" in pgAdmin.<p>And what if we wanted to register all the PostgresClusters in a namespace? Since we are using Kubernetes-native idioms for label matching, we can add a serverGroup with an empty <code>postgresClusterSelector</code> like this:<pre><code class=language-yaml>spec:
  serverGroups:
    - name: supply
      postgresClusterSelector: {}
</code></pre><p>This Kubernetes-specific idiom matches <em>all</em> PostgresClusters in the namespace.<p>If you want to deploy one pgAdmin to manage all the PostgresClusters in a namespace and share those servers with all pgAdmin users, you can configure your <code>pgadmin</code> deployment to automatically register all those PostgresClusters, avoiding the need to manually import them one-by-one! Here’s a sample of databases added from two clusters, hippo and rhino.<p><img alt="pgAdmin screenshot for multiple databases in different kubernetes clusters"loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/a2ff1bb8-4570-48da-1f42-8b97b80b9c00/public><h2 id=finishing-up><a href=#finishing-up>Finishing Up</a></h2><p>Upgrading to Crunchy Postgres for Kubernetes <code>v5.5.0</code> is generally a simple single command.  If you installed Crunchy Postgres for Kubernetes using the Kustomize installer available in the <a href=https://github.com/CrunchyData/postgres-operator-examples>Postgres Operator examples repository</a>, you would issue: <code>kubectl apply --server-side -k kustomize/install/default</code>. For additional upgrade guidance please see <a href=https://access.crunchydata.com/documentation/postgres-operator/latest/upgrade>the Crunchy Postgres for Kubernetes Upgrade documentation</a>.<p>The entire team at Crunchy Data is thrilled to introduce this new feature to our customers and the community. As always, we welcome your feedback, so feel free to join the discussion on our <a href=https://discord.gg/ErmzUAmTvy>Discord</a> server. ]]></content:encoded>
<category><![CDATA[ Kubernetes ]]></category>
<author><![CDATA[ Ben.Blattberg@crunchydata.com (Ben Blattberg) ]]></author>
<dc:creator><![CDATA[ Ben Blattberg ]]></dc:creator>
<guid isPermalink="false">77d92e721a612ab94656bf16af6c68696e595ec28cc8cfbb3d80227f6318777b</guid>
<pubDate>Thu, 14 Dec 2023 08:00:00 EST</pubDate>
<dc:date>2023-12-14T13:00:00.000Z</dc:date>
<atom:updated>2023-12-14T13:00:00.000Z</atom:updated></item>
<item><title><![CDATA[ Stateful Postgres Storage Using Kubernetes ]]></title>
<link>https://www.crunchydata.com/blog/stateful-postgres-storage-using-kubernetes</link>
<description><![CDATA[ How can Kubernetes be the foundation of a stateful database? Ben takes us through the basics of Persistent Volumes, Volume Claims, and Postgres storage. ]]></description>
<content:encoded><![CDATA[ <p>Kubernetes was developed originally as an orchestration system for stateless applications. Today, Kubernetes is the backbone of countless full stack applications with, notably, a database as part of the stack. So, a question we often hear is:<blockquote><p>How can Kubernetes be the foundation of that most stateful application of all, the database?</blockquote><h2 id=kubernetes--storage><a href=#kubernetes--storage>Kubernetes &#38 Storage</a></h2><h3 id=ephemeral-pods><a href=#ephemeral-pods>Ephemeral Pods</a></h3><p>Let’s say you maintain a Postgres database and you’ve been tasked with moving it to Kubernetes. You can just start up a Kubernetes pod running a Postgres image and load the data and call it a day.<p>As soon as that pod goes away, so will that small but critical database, because the database storage existed as part of that ephemeral pod.<p>When you created that pod, you told the underlying computer to reserve a certain amount of resources for it—a certain amount of compute power, a certain amount of memory, and a certain amount of storage comes with this automatically. But as soon as that pod goes away, all of that is released back into the pool.<p>And pods do go away even when you don’t want them to. Maybe a pod exceeded the resource limits you gave it, or maybe the process hit a fatal exception, or—well, there are a lot of ways a pod can die. Let’s call this the first lesson of Kubernetes: pods are ephemeral—as the old saying goes, consider these cattle, not pets. While this model of deployment is ideal for application services, we need something different to handle the database in Kubernetes.<h3 id=persistent-volumes><a href=#persistent-volumes>Persistent Volumes</a></h3><p>So, how can we create a database on Kubernetes and not worry about the ephemeral pod? The answer is “Volumes”, or more specifically certain types of volumes that are <em>independent</em> of the pod lifecycle.<p>For an example of a volume type that is <em>not</em> independent, let’s take <code>emptyDir</code>. The Kubernetes doc on <a href=https://kubernetes.io/docs/concepts/storage/volumes/#volume-types>Volume types</a> lets us know that when a pod with an emptyDir is deleted, the data in there will be “deleted permanently”. For backing a Postgres instance, this is not a good idea.<p>What we want here is a volume that won’t go away when the pods are removed. Luckily, Kubernetes has the concept of a Persistent Volume (<code>PersistentVolume</code> or <code>PV</code>)- a volume that will persist no matter what happens with the pod (sort of — we’ll get into that later).<p>What’s even better, a Persistent Volume is an abstraction for any kind of storage. So, do you want to put your storage somewhere remote, like on AWS, Azure, Google, etc.? Then you can use a Persistent Volume. But maybe you want to use some of your Kubernetes node’s own storage? Then you can <em>also</em> use a Persistent Volume.<p>As far as Kubernetes knows, a Persistent Volume is just some piece of storage somewhere <em>and</em> its lifecycle is independent of the pod that’s using it. So if the pod goes away — say, it hits a memory limit and is OOMkilled — the storage is still there, ready to be used by your Postgres pod when it regenerates.<p>But first, we have to tell the Postgres pod to use that storage and we have to tell other pods to leave our storage alone. And we do that with a Persistent Volume Claim (<code>PersistentVolumeClaim</code> or <code>PVC</code>).<h3 id=persistent-volume-claim><a href=#persistent-volume-claim>Persistent Volume Claim</a></h3><p>A Persistent Volume Claim is a claim on a persistent volume — and not just any persistent volume. After all, it wouldn’t be great for your 20G database if your Postgres pod tried to claim and use 1G of storage.<p>A Persistent Volume Claim lets you request a certain amount of storage. But a Persistent Volume Claim is more than just a specification of a certain amount of storage. You might also want to specify the <code>access mode</code> for this storage or the <code>StorageClass</code>. For example, if you have data that changes quickly, you might want a different storage option than if you are taking long-lasting backups that never change.<h3 id=storage-classes><a href=#storage-classes>Storage Classes</a></h3><p>By now you may have noticed that the Kubernetes developers are pretty good at using descriptive names and that’s true here: a storage class is a class of storage. That is, it is a category that defines certain behaviors and attributes of the storage. Or as the <a href=https://kubernetes.io/docs/concepts/storage/storage-classes/>Kubernetes docs</a> on Storage Class say, "Different classes might map to quality-of-service levels, or to backup policies, or to arbitrary policies determined by the cluster administrators."<p>One of those behaviors is uniquely important and that’s the provisioner. That’s the field on the Storage Class that lets Kubernetes know how to dynamically provision new Persistent Volumes.<p>If you have a Kubernetes cluster that somehow doesn’t have any Storage Classes — or is missing a provisioner — then you won’t be able to dynamically create Volumes. In that case, the Kubernetes admin will have to manually provision those Volumes.<p>The main reason why I’m mentioning Storage Classes to you — the Postgres expert who needs to migrate your database to Kubernetes — is to remind you that where you put your data matters. As I said above, Kubernetes doesn’t care if your Persistent Volume is on-prem or in any particular cloud provider — but each of these options has different storage backends and those different storage backends will offer different options.<h3 id=mid-article-kubernetes-summary><a href=#mid-article-kubernetes-summary>Mid-Article Kubernetes Summary</a></h3><p>I’ve thrown a lot at you, so to summarize:<ul><li>Pods die, ephemeral pods can’t do it alone<li>If you want your data to survive a pod death, you need a Persistent Volume<li>In order for a Pod to use a Persistent Volume, you need to wire the Pod to the Persistent Volume through a Persistent Volume Claim<li>Different Storage Classes offer different options</ul><p>Putting it all together:<ul><li>Just like a Kubernetes cluster has compute and memory resources available that you can request for your pod, a Kubernetes cluster may have storage that you can request. The storage exists as a Persistent Volume and you request a certain amount of storage with a Persistent Volume Claim.</ul><h3 id=postgres-on-kubernetes><a href=#postgres-on-kubernetes>Postgres on Kubernetes</a></h3><p>With that in mind, how would you architect a Postgres instance with persistent storage on Kubernetes? Let’s get a napkin:<p><img alt=basic.svg loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/b2bb3d56-6dd0-4e95-27fb-af20b8f90400/public><p>That’s OK from a Kubernetes perspective: we have a pod that runs the database and we have a Persistent Volume that holds the storage. If the pod goes away, the data is still there in the volume.<p>But from a Postgres perspective, we have Postgres saving our data and our WAL files (which are good for backing up) to the same volume. That’s not great for some recovery scenarios. So let’s add a little more redundancy to our system by pushing our WAL files to another Persistent Volume.<p><img alt=basic1.svg loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/53cd2ea3-b96b-40a3-b209-64ed329e4100/public><p>That’s better for storage persistence and recovery from backup. But we probably want to add a Postgres replica for high-availability. What does that look like with persistent storage?<p><img alt=basic2.svg loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/6d39e9e6-47d7-42c7-e0ce-fee47118c800/public><p>This is a little generic: I’m not getting into what’s pushing the WAL files to the Persistent Volume for backup storage. Theoretically, you might backup in some other ways. But the general lesson here is you probably want to have your primary storage separate from your backup storage. Maybe you want to have it really separate? You could use something like <a href=https://access.crunchydata.com/documentation/pgbackrest/latest/>pgBackRest</a>, which can push files to some remote cloud-based storage.<p><img alt=basic3.svg loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/098e6ad4-7ede-41d1-7aa9-8919b3ebf400/public><p>Again, the general idea here is you likely want to have two separate storage volumes for your database and your recovery material. There are a few ways to do that. I mean, if you wanted to, you could exec into your Postgres pod regularly and run pg_dump, and copy that output somewhere. That's not a production ready solution though.<h2 id=the-postgres-operator--storage><a href=#the-postgres-operator--storage>The Postgres Operator &#38 Storage</a></h2><p>One of the great things about using an operator is that a lot of the storage handling is solved for you. With the <a href=https://github.com/CrunchyData/postgres-operator><dfn>Postgres Operator</dfn> (<abbr>PGO</abbr>)</a>, when you spin up a Postgres instance, PGO can create the pod and the PVC according to your specifications and according to the needs of your Kubernetes cluster.<p>For instance, maybe you already have a Persistent Volume from a previous Postgres backup and you want to use that data to bootstrap a new cluster — we can do that. Or maybe you want to dynamically create a new Persistent Volume using a particular Storage Class or just want to use the default Storage Class — well, we can do that too with PGO.<p>(As a reminder, as I noted above, different commercial Kubernetes services offer different options for Storage Classes; and in general, up-to-date clusters on AWS EKS, Azure AKS, Google GKE, etc., will have a default Storage Class. But you can always — and probably should — check what the Storage Classes are with <code>kubectl get storageclass</code>.)<h3 id=pgo-creates-pods-and-pvcs-for-you><a href=#pgo-creates-pods-and-pvcs-for-you>PGO creates Pods and PVCs for you</a></h3><p>Here’s example yaml for a very basic Postgres instance, with one Postgres pod (no replicas):<pre><code class=language-yaml>apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo
  namespace: postgres-operator
spec:
  backups:
    pgbackrest:
      repos:
        - name: repo1
          volume:
            volumeClaimSpec:
              accessModes:
                - ReadWriteOnce
              resources:
                requests:
                  storage: 1Gi
  instances:
    - dataVolumeClaimSpec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
      name: ''
      replicas: 1
  postgresVersion: 14
</code></pre><p>Notice that the <code>postgrescluster</code> object has a <code>Volume Claim Spec</code> under <code>Spec.Backups</code> and a <code>Data Volume Claim Spec</code> under <code>Spec.Instances</code>. We have those separate and independent of each other so you could define each differently.<p>Once I create that Postgres instance, I can check on the pods:<pre><code class=language-shell>$ kubectl get pods --namespace postgres-operator
NAME                      READY   STATUS      RESTARTS   AGE
hippo-repo-host-0         2/2     Running     0          3m23s
hippo-00-6wh4-0           4/4     Running     0          3m23s
</code></pre><p>Wait, why do I have two pods if I only have one Postgres instance with no replicas? The <code>hippo-repo-host-0</code> is running <code>pgBackRest</code>, our preferred backup solution, which is connected to its own local <code>PersistentVolume</code>. We can check the <code>PersistentVolumeClaims</code> to see that in action:<pre><code class=language-shell>$ kubectl get persistentvolumeclaims --namespace postgres-operator
NAME                   STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
hippo-00-l5gw-pgdata   Bound         pvc-9e19c77f-c111-4891-a1b5-776d23e06c18   1Gi        RWO            local-path     85s
hippo-repo1            Bound         pvc-83a217c0-bffa-4e66-8ff6-15bbd5fadf07   1Gi        RWO            local-path     86s
</code></pre><p>Notice also that those <code>pvc</code>s have a status of <code>Bound</code> and tell us which volume they are bound to. They have a set capacity of 1Gi (as I requested in that original yaml above) and they have a specified access mode <code>RWO</code> or <code>read-write once</code>, meaning one pod can use this volume at a time.<p>And that <code>StorageClass</code> ”local-path”? That’s the default <code>StorageClass</code> on this Kubernetes cluster that I’m using:<pre><code class=language-shell>$ kubectl describe storageclass local-path
Name:                  local-path
IsDefaultClass:        Yes
Provisioner:           rancher.io/local-path
</code></pre><p>Because I have a default Storage Class with a provisioner, I don’t have to worry about creating a Persistent Volume by hand — the provisioner takes care of creating those based on the Persistent Volume Claims.<p>But what if you didn’t want to backup to another PV, but wanted to backup to some other location? PGO is built to support many different options and, out of the box, you can push your backups to:<ul><li>Any Kubernetes supported storage class (which is what we’re using here)<li>Amazon S3 (or S3 equivalents like MinIO)<li><dfn>Google Cloud Storage</dfn> (<abbr>GCS</abbr>)<li>Azure Blob Storage</ul><p>You can even push backups to multiple repositories at the same time — so you could take a local backup <em>and</em> push to remote storage of your choice.<p>Now let’s check out the Persistent Volumes:<pre><code class=language-shell>$ kubectl get persistentvolumes
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                          STORAGECLASS   REASON   AGE
pvc-83a217c0-bffa-4e66-8ff6-15bbd5fadf07   1Gi        RWO            Delete           Bound    default/hippo-repo1            local-path              12m
pvc-9e19c77f-c111-4891-a1b5-776d23e06c18   1Gi        RWO            Delete           Bound    default/hippo-00-l5gw-pgdata   local-path              12m
</code></pre><p>What’s interesting here? Notice that the <code>Capacity</code> and <code>Access Mode</code> matches the <code>PersistentVolumeClaim</code>'s. It’s also very nice that <code>PersistentVolumes</code> point to the <code>PVC</code> that has claimed it, just like <code>PVC</code>'s point to the <code>PersistentVolumes</code> that they have claimed.<p>But what’s really interesting here is the <code>Reclaim Policy</code>. Remember when I said that the lifecycle of the <code>PersistentVolume</code> was independent of the <code>Pod</code> and added “sort of”? This is that “sort of.”<p>A Persistent Volume is independent of the Pod's lifecycle, but not independent of the Persistent Volume Claim's lifecycle. When the PVC is deleted, Kubernetes will handle the PV according to the Reclaim Policy.<p>So what do you do if you want to delete your <code>postgrescluster</code> but want to keep storage around to use for something later? You can accomplish this by changing the Reclaim Policy of those Persistent Volumes to <code>Retain</code>. If you do that and then delete your postgres cluster, your persistent volumes will, well, persist.<h3 id=summary><a href=#summary>Summary</a></h3><p>Kubernetes was created first with stateless applications in mind, but the project has grown to embrace databases, with Kubernetes-native architecture that perfectly fits the needs of persisting data.<p>This is just an introduction to the ideas behind persistent storage on Kubernetes and the many options available to you running a Postgres instance on Kubernetes.<p>If all this is something you don’t want to handle yourself, that doesn’t mean you can’t run Postgres in Kubernetes. Our <a href=https://www.crunchydata.com/products/crunchy-postgresql-for-kubernetes>Postgres Operator</a> has been supporting customers with stateful apps for over five years. Try our Operator today with our <a href=https://access.crunchydata.com/documentation/postgres-operator/latest/quickstart/>quickstart</a>. ]]></content:encoded>
<category><![CDATA[ Kubernetes ]]></category>
<author><![CDATA[ Ben.Blattberg@crunchydata.com (Ben Blattberg) ]]></author>
<dc:creator><![CDATA[ Ben Blattberg ]]></dc:creator>
<guid isPermalink="false">28f28f591ac87fa15ce3d7ac543e1b990a33ef4695d15a5db32c59216eb38181</guid>
<pubDate>Wed, 25 Jan 2023 10:00:00 EST</pubDate>
<dc:date>2023-01-25T15:00:00.000Z</dc:date>
<atom:updated>2023-01-25T15:00:00.000Z</atom:updated></item></channel></rss>