Ben Blattberg | CrunchyData Blog

pgAdmin for All of Your Postgres Kubernetes Clusters

Ben.Blattberg@crunchydata.com (Ben Blattberg) — Thu, 14 Dec 2023 08:00:00 EST

We recently announced the latest update of Crunchy Postgres for Kubernetes 5.5. In this version 5.5 update, we would like to highlight a key feature: the introduction of a new pgAdmin API.

The notable changes in this feature include:

The ability to manage all Postgres clusters through a single interface
Automatic cluster detection
A new custom resource file for pgAdmin

Read on to explore the full range of functionalities and learn how to set up and utilize this new pgAdmin experience. Additionally, make sure to consult our comprehensive documentation, which provides practical how-tos, best practices, and detailed step-by-step guides to maximize your Crunchy Postgres for Kubernetes experience.

Updated pgAdmin deployment

If you are already familiar with our previous method of providing pgAdmin and prefer not to make changes, don't worry! You don't have to. We have not made any modifications to how you request a pgAdmin instance attached to a single PostgresCluster. You can still add it to your PostgresCluster definition like this:

kind: PostgresCluster
spec:
  userInterface:
    pgAdmin: ...

The most noticeable change is the creation of a new API and a new Custom Resource specifically for pgAdmin. This pgAdmin instance will be separate and independent from any PostgresCluster. Here’s a sample of the new PGAdmin Custom Resource file.

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PGAdmin
metadata:
  name: rhino
spec:
  dataVolumeClaimSpec:
    accessModes:
      - 'ReadWriteOnce'
    resources:
      requests:
        storage: 1Gi
  serverGroups:
    - name: supply
      postgresClusterSelector: {}

One advantage of this new method is that the pgAdmin is not limited to any specific PostgresCluster. In fact, this new approach allows you to easily create a single pgAdmin instance that can manage multiple PostgresClusters within a namespace. We refer to this new approach as "namespace-scoped" because it enables the management of several clusters in a namespace.

In addition to this change, we have also released a new pgAdmin image for this new implementation, which is compatible with the most recent versions of Postgres.

We believe you will appreciate these updates, and we have prepared a walkthrough to demonstrate the flexibility and power of this new experience.

Walkthrough

Let's set up a new namespace-scoped pgAdmin instance.

Checking the CRD Existence

Before we can create a pgAdmin instance from the Custom Resource, we need to ensure that we are running PGO v5.5 with the new pgadmins Custom Resource installed. Let's verify the existence of the pgadmins resource:

-> kubectl get crd pgadmins.postgres-operator.crunchydata.com
NAME                                                 CREATED AT
pgadmins.postgres-operator.crunchydata.com           ...

If we see the pgadmins Custom Resource, we are ready to proceed with creating a pgAdmin instance by defining a YAML and sending it to the Kubernetes cluster.

Creating a pgAdmin Instance

Let's define a basic pgadmins instance:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PGAdmin
metadata:
  name: rhino
spec:
  dataVolumeClaimSpec:
    accessModes:
      - 'ReadWriteOnce'
    resources:
      requests:
        storage: 1Gi
  serverGroups:
    - name: demand
      postgresClusterSelector:
        matchLabels:
          owner: logistics

In this example, we define a pgadmin named "rhino" with 1Gi of storage. 1Gi of storage is our usual amount in our examples but can be adjusted for your specific needs. We will discuss the serverGroups section in a moment, but first, let's create this pgAdmin instance and access it.

To create the instance, we can use the kubectl create command with this YAML, or any other command/infrastructure you typically use to create Kubernetes instances.

Accessing the pgAdmin

To access the pgAdmin instance, we need to access the port and retrieve the user credentials.

Exposing the Port

The pod exposes pgAdmin on port 5050. To access it, we can create a Service and port-forward to that Service.

If you already have a Service or want to port-forward directly to the Pod, you can do that as well.

To create a Service, we can use kubectl expose, which conveniently creates a Service for us if we provide the name of a Pod. First, we need to obtain the name of the Pod by selecting it using a label. Assuming we want to connect to the rhino pgAdmin example mentioned earlier, we can run the following commands:

# Select the pod using the `postgres-operator.crunchydata.com/pgadmin=rhino` label
# and save the name to the variable PGADMIN_POD for easier reuse.
PGADMIN_POD=$(kubectl get pod -n postgres-operator -l postgres-operator.crunchydata.com/pgadmin=rhino -o name)

# Create a Service with the expose command.
kubectl expose -n postgres-operator ${PGADMIN_POD} --name rhino-pgadmin

# Port-forward to the new service
kubectl port-forward -n postgres-operator svc/rhino-pgadmin 5050:5050

Once we have completed these steps, we can open our browser and navigate to http://localhost:5050, which will display the pgAdmin login screen. Now we need to obtain our credentials.

Getting the Credentials

In this version, pgAdmin requires an initial administrator user. As part of the deployment, the Operator sets up an administrator user and generates a password, which we can use.

The username we define is hardcoded to prevent issues with username redefinition. The admin username is admin@<pgAdmin_name>.<pgAdmin_namespace>.svc.

The password for that user is stored in the Secret with the label postgres-operator.crunchydata.com/pgadmin=<pgAdmin_name>.

For example, in our rhino example (assuming the pgadmin was created in the postgres-operator namespace), the user would be admin@rhino.postgres-operator.svc. To retrieve the password, we can run the following command:

kubectl get secret -n postgres-operator -l postgres-operator.crunchydata.com/pgadmin=rhino -o jsonpath='{.items[0].data.password}' | base64 -d

This command will output the password.

With the port exposed, and with the username and password, we can now log in as the administrator and manage other users.

Customizing Your pgAdmin

Configuring External User Management

But what if we don't want to manage users? Similar to the previous method of deploying pgAdmin, this new namespace-scoped pgAdmin accepts custom configurations, including LDAP configuration.

For instance, if we want to change the configuration to disable Gravatar images for users, we can set the following in the pgadmin spec:

spec:
  config:
    settings:
      SHOW_GRAVATAR_IMAGE: False

Additionally, we can mount files to /etc/pgadmin/conf.d inside the pgAdmin container using projected volumes. For example, the following spec mounts the useful.txt file from the mysecret Secret to /etc/pgadmin/conf.d/useful.txt:

spec:
  config:
    files:
      - secret:
          name: mysecret
          items:
            - key: useful.txt

For LDAP configuration, we need to set certain configuration options through the spec.config.settings section of the YAML. If we have an LDAP bind password, we would also need to mount a Secret with that password using the special field spec.config.ldapBindPassword. (We use this special field to remind users not to include passwords directly in the YAML.)

To illustrate, if we wanted to add LDAP configuration and had a Secret named ldap-pass with the password stored in a field called my-password, our pgadmin spec might look like this:

spec:
  config:
    settings:
      AUTHENTICATION_SOURCES: ['ldap']
      LDAP_AUTO_CREATE_USER: True
      LDAP_SERVER_URI: ldaps://my.ds.example.com
    ldapBindPassword:
      name: ldap-pass
      key: my-password

Automated Server Discovery

In our previous architecture, each pgAdmin instance was tied to a PostgresCluster. With the new approach, a pgAdmin instance can automatically register multiple PostgresClusters. How does this work?

This is where the serverGroups section of the PgAdmin spec, mentioned earlier, comes into play. In this section, we can define multiple selector groups that the Operator uses to select specific PostgresClusters based on labels.

The Operator accomplishes this in a Kubernetes-native manner: it lists all the PostgresClusters in the namespace and then filters them based on the labels or selectors we have defined.

For example, if we had a pgAdmin set up with the spec mentioned above, this is the section that defines our filtering:

spec:
  serverGroups:
    - name: demand
      postgresClusterSelector:
        matchLabels:
          owner: logistics

This spec instructs the Operator to "list the PostgresClusters in the namespace, but filter that list for clusters with the label owner: logistics."

The Operator then creates that list, which the pgAdmin instance registers. In fact, the pgAdmin process registers these Postgres instances as "shared servers" owned by the initial administrator user set up by pgAdmin. A "shared server" is owned by one user (the administrator) but can be viewed in the pgAdmin list by any user.

Please note that for namespace-scoped pgAdmin deployments, there is no automatic synchronization between pgAdmin and Postgres users. Even if you can sign in to pgAdmin and see PostgresClusters, you will still need a valid Postgres user and credentials to access the Postgres database.

It is possible to omit the serverGroups field or to have no PostgresClusters discovered with the given selectors. Due to how pgAdmin currently functions, when zero servers are discovered, the initial administrator user will be unable to manually add new ServerGroups or Servers.

But what if we wanted to register PostgresClusters with different labels? We can handle this scenario by adding multiple serverGroups, which will create separate "Server Groups" in pgAdmin.

And what if we wanted to register all the PostgresClusters in a namespace? Since we are using Kubernetes-native idioms for label matching, we can add a serverGroup with an empty postgresClusterSelector like this:

spec:
  serverGroups:
    - name: supply
      postgresClusterSelector: {}

This Kubernetes-specific idiom matches all PostgresClusters in the namespace.

If you want to deploy one pgAdmin to manage all the PostgresClusters in a namespace and share those servers with all pgAdmin users, you can configure your pgadmin deployment to automatically register all those PostgresClusters, avoiding the need to manually import them one-by-one! Here’s a sample of databases added from two clusters, hippo and rhino.

Finishing Up

Upgrading to Crunchy Postgres for Kubernetes v5.5.0 is generally a simple single command. If you installed Crunchy Postgres for Kubernetes using the Kustomize installer available in the Postgres Operator examples repository, you would issue: kubectl apply --server-side -k kustomize/install/default. For additional upgrade guidance please see the Crunchy Postgres for Kubernetes Upgrade documentation.

The entire team at Crunchy Data is thrilled to introduce this new feature to our customers and the community. As always, we welcome your feedback, so feel free to join the discussion on our Discord server.

Stateful Postgres Storage Using Kubernetes

Ben.Blattberg@crunchydata.com (Ben Blattberg) — Wed, 25 Jan 2023 10:00:00 EST

Kubernetes was developed originally as an orchestration system for stateless applications. Today, Kubernetes is the backbone of countless full stack applications with, notably, a database as part of the stack. So, a question we often hear is:

How can Kubernetes be the foundation of that most stateful application of all, the database?

Kubernetes & Storage

Ephemeral Pods

Let’s say you maintain a Postgres database and you’ve been tasked with moving it to Kubernetes. You can just start up a Kubernetes pod running a Postgres image and load the data and call it a day.

As soon as that pod goes away, so will that small but critical database, because the database storage existed as part of that ephemeral pod.

When you created that pod, you told the underlying computer to reserve a certain amount of resources for it—a certain amount of compute power, a certain amount of memory, and a certain amount of storage comes with this automatically. But as soon as that pod goes away, all of that is released back into the pool.

And pods do go away even when you don’t want them to. Maybe a pod exceeded the resource limits you gave it, or maybe the process hit a fatal exception, or—well, there are a lot of ways a pod can die. Let’s call this the first lesson of Kubernetes: pods are ephemeral—as the old saying goes, consider these cattle, not pets. While this model of deployment is ideal for application services, we need something different to handle the database in Kubernetes.

Persistent Volumes

So, how can we create a database on Kubernetes and not worry about the ephemeral pod? The answer is “Volumes”, or more specifically certain types of volumes that are independent of the pod lifecycle.

For an example of a volume type that is not independent, let’s take emptyDir. The Kubernetes doc on Volume types lets us know that when a pod with an emptyDir is deleted, the data in there will be “deleted permanently”. For backing a Postgres instance, this is not a good idea.

What we want here is a volume that won’t go away when the pods are removed. Luckily, Kubernetes has the concept of a Persistent Volume (PersistentVolume or PV)- a volume that will persist no matter what happens with the pod (sort of — we’ll get into that later).

What’s even better, a Persistent Volume is an abstraction for any kind of storage. So, do you want to put your storage somewhere remote, like on AWS, Azure, Google, etc.? Then you can use a Persistent Volume. But maybe you want to use some of your Kubernetes node’s own storage? Then you can also use a Persistent Volume.

As far as Kubernetes knows, a Persistent Volume is just some piece of storage somewhere and its lifecycle is independent of the pod that’s using it. So if the pod goes away — say, it hits a memory limit and is OOMkilled — the storage is still there, ready to be used by your Postgres pod when it regenerates.

But first, we have to tell the Postgres pod to use that storage and we have to tell other pods to leave our storage alone. And we do that with a Persistent Volume Claim (PersistentVolumeClaim or PVC).

Persistent Volume Claim

A Persistent Volume Claim is a claim on a persistent volume — and not just any persistent volume. After all, it wouldn’t be great for your 20G database if your Postgres pod tried to claim and use 1G of storage.

A Persistent Volume Claim lets you request a certain amount of storage. But a Persistent Volume Claim is more than just a specification of a certain amount of storage. You might also want to specify the access mode for this storage or the StorageClass. For example, if you have data that changes quickly, you might want a different storage option than if you are taking long-lasting backups that never change.

Storage Classes

By now you may have noticed that the Kubernetes developers are pretty good at using descriptive names and that’s true here: a storage class is a class of storage. That is, it is a category that defines certain behaviors and attributes of the storage. Or as the Kubernetes docs on Storage Class say, "Different classes might map to quality-of-service levels, or to backup policies, or to arbitrary policies determined by the cluster administrators."

One of those behaviors is uniquely important and that’s the provisioner. That’s the field on the Storage Class that lets Kubernetes know how to dynamically provision new Persistent Volumes.

If you have a Kubernetes cluster that somehow doesn’t have any Storage Classes — or is missing a provisioner — then you won’t be able to dynamically create Volumes. In that case, the Kubernetes admin will have to manually provision those Volumes.

The main reason why I’m mentioning Storage Classes to you — the Postgres expert who needs to migrate your database to Kubernetes — is to remind you that where you put your data matters. As I said above, Kubernetes doesn’t care if your Persistent Volume is on-prem or in any particular cloud provider — but each of these options has different storage backends and those different storage backends will offer different options.

Mid-Article Kubernetes Summary

I’ve thrown a lot at you, so to summarize:

Pods die, ephemeral pods can’t do it alone
If you want your data to survive a pod death, you need a Persistent Volume
In order for a Pod to use a Persistent Volume, you need to wire the Pod to the Persistent Volume through a Persistent Volume Claim
Different Storage Classes offer different options

Putting it all together:

Just like a Kubernetes cluster has compute and memory resources available that you can request for your pod, a Kubernetes cluster may have storage that you can request. The storage exists as a Persistent Volume and you request a certain amount of storage with a Persistent Volume Claim.

Postgres on Kubernetes

With that in mind, how would you architect a Postgres instance with persistent storage on Kubernetes? Let’s get a napkin:

That’s OK from a Kubernetes perspective: we have a pod that runs the database and we have a Persistent Volume that holds the storage. If the pod goes away, the data is still there in the volume.

But from a Postgres perspective, we have Postgres saving our data and our WAL files (which are good for backing up) to the same volume. That’s not great for some recovery scenarios. So let’s add a little more redundancy to our system by pushing our WAL files to another Persistent Volume.

That’s better for storage persistence and recovery from backup. But we probably want to add a Postgres replica for high-availability. What does that look like with persistent storage?

This is a little generic: I’m not getting into what’s pushing the WAL files to the Persistent Volume for backup storage. Theoretically, you might backup in some other ways. But the general lesson here is you probably want to have your primary storage separate from your backup storage. Maybe you want to have it really separate? You could use something like pgBackRest, which can push files to some remote cloud-based storage.

Again, the general idea here is you likely want to have two separate storage volumes for your database and your recovery material. There are a few ways to do that. I mean, if you wanted to, you could exec into your Postgres pod regularly and run pg_dump, and copy that output somewhere. That's not a production ready solution though.

The Postgres Operator & Storage

One of the great things about using an operator is that a lot of the storage handling is solved for you. With the Postgres Operator (PGO), when you spin up a Postgres instance, PGO can create the pod and the PVC according to your specifications and according to the needs of your Kubernetes cluster.

For instance, maybe you already have a Persistent Volume from a previous Postgres backup and you want to use that data to bootstrap a new cluster — we can do that. Or maybe you want to dynamically create a new Persistent Volume using a particular Storage Class or just want to use the default Storage Class — well, we can do that too with PGO.

(As a reminder, as I noted above, different commercial Kubernetes services offer different options for Storage Classes; and in general, up-to-date clusters on AWS EKS, Azure AKS, Google GKE, etc., will have a default Storage Class. But you can always — and probably should — check what the Storage Classes are with kubectl get storageclass.)

PGO creates Pods and PVCs for you

Here’s example yaml for a very basic Postgres instance, with one Postgres pod (no replicas):

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo
  namespace: postgres-operator
spec:
  backups:
    pgbackrest:
      repos:
        - name: repo1
          volume:
            volumeClaimSpec:
              accessModes:
                - ReadWriteOnce
              resources:
                requests:
                  storage: 1Gi
  instances:
    - dataVolumeClaimSpec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
      name: ''
      replicas: 1
  postgresVersion: 14

Notice that the postgrescluster object has a Volume Claim Spec under Spec.Backups and a Data Volume Claim Spec under Spec.Instances. We have those separate and independent of each other so you could define each differently.

Once I create that Postgres instance, I can check on the pods:

$ kubectl get pods --namespace postgres-operator
NAME                      READY   STATUS      RESTARTS   AGE
hippo-repo-host-0         2/2     Running     0          3m23s
hippo-00-6wh4-0           4/4     Running     0          3m23s

Wait, why do I have two pods if I only have one Postgres instance with no replicas? The hippo-repo-host-0 is running pgBackRest, our preferred backup solution, which is connected to its own local PersistentVolume. We can check the PersistentVolumeClaims to see that in action:

$ kubectl get persistentvolumeclaims --namespace postgres-operator
NAME                   STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
hippo-00-l5gw-pgdata   Bound         pvc-9e19c77f-c111-4891-a1b5-776d23e06c18   1Gi        RWO            local-path     85s
hippo-repo1            Bound         pvc-83a217c0-bffa-4e66-8ff6-15bbd5fadf07   1Gi        RWO            local-path     86s

Notice also that those pvcs have a status of Bound and tell us which volume they are bound to. They have a set capacity of 1Gi (as I requested in that original yaml above) and they have a specified access mode RWO or read-write once, meaning one pod can use this volume at a time.

And that StorageClass ”local-path”? That’s the default StorageClass on this Kubernetes cluster that I’m using:

$ kubectl describe storageclass local-path
Name:                  local-path
IsDefaultClass:        Yes
Provisioner:           rancher.io/local-path

Because I have a default Storage Class with a provisioner, I don’t have to worry about creating a Persistent Volume by hand — the provisioner takes care of creating those based on the Persistent Volume Claims.

But what if you didn’t want to backup to another PV, but wanted to backup to some other location? PGO is built to support many different options and, out of the box, you can push your backups to:

Any Kubernetes supported storage class (which is what we’re using here)
Amazon S3 (or S3 equivalents like MinIO)
Google Cloud Storage (GCS)
Azure Blob Storage

You can even push backups to multiple repositories at the same time — so you could take a local backup and push to remote storage of your choice.

Now let’s check out the Persistent Volumes:

$ kubectl get persistentvolumes
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                          STORAGECLASS   REASON   AGE
pvc-83a217c0-bffa-4e66-8ff6-15bbd5fadf07   1Gi        RWO            Delete           Bound    default/hippo-repo1            local-path              12m
pvc-9e19c77f-c111-4891-a1b5-776d23e06c18   1Gi        RWO            Delete           Bound    default/hippo-00-l5gw-pgdata   local-path              12m

What’s interesting here? Notice that the Capacity and Access Mode matches the PersistentVolumeClaim's. It’s also very nice that PersistentVolumes point to the PVC that has claimed it, just like PVC's point to the PersistentVolumes that they have claimed.

But what’s really interesting here is the Reclaim Policy. Remember when I said that the lifecycle of the PersistentVolume was independent of the Pod and added “sort of”? This is that “sort of.”

A Persistent Volume is independent of the Pod's lifecycle, but not independent of the Persistent Volume Claim's lifecycle. When the PVC is deleted, Kubernetes will handle the PV according to the Reclaim Policy.

So what do you do if you want to delete your postgrescluster but want to keep storage around to use for something later? You can accomplish this by changing the Reclaim Policy of those Persistent Volumes to Retain. If you do that and then delete your postgres cluster, your persistent volumes will, well, persist.

Summary

Kubernetes was created first with stateless applications in mind, but the project has grown to embrace databases, with Kubernetes-native architecture that perfectly fits the needs of persisting data.

This is just an introduction to the ideas behind persistent storage on Kubernetes and the many options available to you running a Postgres instance on Kubernetes.

If all this is something you don’t want to handle yourself, that doesn’t mean you can’t run Postgres in Kubernetes. Our Postgres Operator has been supporting customers with stateful apps for over five years. Try our Operator today with our quickstart.