Skip to content

Commit

Permalink
Commence tutorial
Browse files Browse the repository at this point in the history
This adds a tutorial section to the documentation, which is a
relative rehash of other sections but aims to tell an improved
story on how to use the Operator. This is still a work in progress
and more effort will be put into it prior to the 4.5.0 GA, but
it is good enough to get started.
  • Loading branch information
Jonathan S. Katz committed Sep 29, 2020
1 parent 686ecab commit 42fe8d5
Show file tree
Hide file tree
Showing 6 changed files with 355 additions and 2 deletions.
3 changes: 2 additions & 1 deletion docs/content/installation/prerequisites.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ The PostgreSQL Operator is tested in the following environments:

* Kubernetes v1.13+
* Red Hat OpenShift v3.11+
* Red Hat OpenShift v4.3+
* Red Hat OpenShift v4.4+
* Amazon EKS
* VMWare Enterprise PKS 1.3+
* IBM Cloud Pak Data

Expand Down
2 changes: 1 addition & 1 deletion docs/content/quickstart/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -243,7 +243,7 @@ PGPASSWORD=datalake psql -h localhost -p 5432 -U testuser hippo
You should then be greeted with the PostgreSQL prompt:

```
psql (12.4)
psql ({{< param postgresVersion >}})
Type "help" for help.
hippo=>
Expand Down
13 changes: 13 additions & 0 deletions docs/content/tutorial/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: "Tutorial"
draft: false
weight: 15
---

The PostgreSQL Operator provides functionality that lets you run your own database-as-a-service: from deploying PostgreSQL clusters with [high availability]({{< relref "architecture/high-availability/_index.md" >}}), to a [full stack monitoring]({{< relref "architecture/high-availability/_index.md" >}}) solution, essential [disaster recovery and backup tools]({{< relref "architecture/disaster-recovery.md" >}}), the ability to secure your cluster with TLS, and much more!

What's more, you can manage your PostgreSQL clusters with the convenient [`pgo` client]({{< relref "pgo-client/_index.md" >}}) or by interfacing directly with the PostgreSQL Operator [custom resources]({{< relref "custom-resources/_index.md" >}}).

Given the robustness of the PostgreSQL Operator, we think it's helpful to break down the functionality in this step-by-step tutorial. The tutorial covers the essential functions the PostgreSQL Operator can perform and covers many common basic and advanced use cases.

So what are you waiting for? Let's [get started]({{< relref "tutorial/getting-started.md" >}})!
149 changes: 149 additions & 0 deletions docs/content/tutorial/connect-cluster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
---
title: "Connect to a Postgres Cluster"
draft: false
weight: 120
---

Naturally, once the [PostgreSQL cluster is created]({{< relref "tutorial/create-cluster.md" >}}), you may want to connect to it. You can get the credentials of the users of the cluster using the [`pgo show user`]({{< relref "pgo-client/reference/pgo_show_user.md" >}}) command, i.e.:

```
pgo show user hippo
```

yields output similar to:

```
CLUSTER USERNAME PASSWORD EXPIRES STATUS ERROR
------- -------- -------------------------------- ------- ------ -----
hippo testuser securerandomlygeneratedpassword never ok
```

If you need to get the password of one of the system or privileged accounts, you will need to use the `--show-system-accounts` flag, i.e.:

```
pgo show user hippo --show-system-accounts
```

```
CLUSTER USERNAME PASSWORD EXPIRES STATUS ERROR
------- ----------- -------------------------------- ------- ------ -----
hippo postgres B>xy}9+7wTVp)gkntf}X|H@N never ok
hippo primaryuser ^zULckQy-\KPws:2UoC+szXl never ok
hippo testuser securerandomlygeneratedpassword never ok
```

Let's look at three different ways we can connect to the PostgreSQL cluster.

## Connecting via `psql`

Let's see how we can connect to `hippo` using [`psql`](https://www.postgresql.org/docs/current/app-psql.html), the command-line tool for accessing PostgreSQL. Ensure you have [installed the `psql` client](https://www.crunchydata.com/developers/download-postgres/binaries/postgresql12).

The PostgreSQL Operator creates a service with the same name as the cluster. See for yourself! Get a list of all of the Services available in the `pgo` namespace:

```
kubectl -n pgo get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hippo ClusterIP 10.96.218.63 <none> 2022/TCP,5432/TCP 59m
hippo-backrest-shared-repo ClusterIP 10.96.75.175 <none> 2022/TCP 59m
postgres-operator ClusterIP 10.96.121.246 <none> 8443/TCP,4171/TCP,4150/TCP 71m
```

Let's connect the `hippo` cluster. First, in a different console window, set up a port forward to the `hippo` service:

```
kubectl -n pgo port-forward svc/hippo 5432:5432
```

You can connect to the database with the following command, substituting `datalake` for your actual password:

```
PGPASSWORD=datalake psql -h localhost -p 5432 -U testuser hippo
```

You should then be greeted with the PostgreSQL prompt:

```
psql ({{< param postgresVersion >}})
Type "help" for help.
hippo=>
```

## Connecting via [pgAdmin 4]({{< relref "architecture/pgadmin4.md" >}})

[pgAdmin 4]({{< relref "architecture/pgadmin4.md" >}}) is a graphical tool that can be used to manage and query a PostgreSQL database from a web browser. The PostgreSQL Operator provides a convenient integration with pgAdmin 4 for managing how users can log into the database.

To add pgAdmin 4 to `hippo`, you can execute the following command:

```
pgo create pgadmin -n pgo hippo
```

It will take a few moments to create the pgAdmin 4 instance. The PostgreSQL Operator also creates a pgAdmin 4 service. See for yourself! Get a list of all of the Services available in the `pgo` namespace:

```
kubectl -n pgo get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hippo ClusterIP 10.96.218.63 <none> 2022/TCP,5432/TCP 59m
hippo-backrest-shared-repo ClusterIP 10.96.75.175 <none> 2022/TCP 59m
hippo-pgadmin ClusterIP 10.96.165.27 <none> 5050/TCP 5m1s
postgres-operator ClusterIP 10.96.121.246 <none> 8443/TCP,4171/TCP,4150/TCP 71m
```

Let's connect to our `hippo` cluster via pgAdmin 4! In a different terminal, set up a port forward to pgAdmin 4:

```
kubectl -n pgo port-forward svc/hippo-pgadmin 5050:5050
```

Navigate your browser to http://localhost:5050 and use your database username (`testuser`) and password (e.g. `datalake`) to log in. Though the prompt says “email address”, using your PostgreSQL username will work:

![pgAdmin 4 Login Page](/images/pgadmin4-login2.png)

(There are occasions where the initial credentials do not properly get set in pgAdmin 4. If you have trouble logging in, try running the command `pgo update user -n pgo hippo --username=testuser --password=datalake`).

Once logged into pgAdmin 4, you will be automatically connected to your database. Explore pgAdmin 4 and run some queries!

## Connecting from a Kubernetes Application

### Within a Kubernetes Cluster

Connecting a Kubernetes application that is within the same cluster that your PostgreSQL cluster is deployed in is as simple as understanding the default [Kubernetes DNS system](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#what-things-get-dns-names). A cluster created by the PostgreSQL Operator automatically creates a Service of the same name (e.g. `hippo`).

Following the example we've created, the hostname for our PostgreSQL cluster is `hippo.pgo` (or `hippo.pgo.svc.cluster.local`). To get your exact [DNS resolution rules](https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/), you may need to consult with your Kubernetes administrator.

Knowing this, we can construct a [Postgres URI](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING) that contains all of the connection info:

`postgres://testuser:[email protected]:5432/hippo`

which breaks down as such:

- `postgres`: the scheme, i.e. a Postgres URI
- `testuser`: the name of the PostgreSQL user
- `securerandomlygeneratedpassword`: the password for `testuser`
- `hippo.jkatz.svc.cluster.local`: the hostname
- `5432`: the port
- `hippo`: the database you want to connect to

If your application or connection driver cannot use the Postgres URI, the above should allow for you to break down the connection string into its appropriate components.

### Outside a Kubernetes Cluster

To connect to a database from an application that is outside a Kubernetes cluster, you will need to set one of the following:

- A Service type of [`LoadBalancer`](https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer) or [`NodePort`](https://kubernetes.io/docs/concepts/services-networking/service/#nodeport)
- An [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/). The PostgreSQL Operator does not provide any management for Ingress types.

To have the PostgreSQL Operator create a Service that is of type `LoadBalancer` or `NodePort`, you can use the `--service-type` flag as part of creating a PostgreSQL cluster, e.g.:

```
pgo create cluster hippo --service-type=LoadBalancer
```

You can also set the `ServiceType` attribute of the [PostgreSQL Operator configuration]({{< relref "configuration/pgo-yaml-configuration.md" >}}) to provide a default Service type for all PostgreSQL clusters that are created.

## Next Steps

We've created a cluster and we've connected to it! Now, let's learn what customizations we can make as part of the cluster creation process.
115 changes: 115 additions & 0 deletions docs/content/tutorial/create-cluster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
---
title: "Create a Postgres Cluster"
draft: false
weight: 110
---

If you came here through the [quickstart]({{< relref "quickstart/_index.md" >}}), you may have already [created a cluster]({{< relref "quickstart/_index.md" >}}#create-a-postgresql-cluster), in which case, feel free to skip ahead, or read onward for a more in depth look into cluster creation!

## Create a PostgreSQL Cluster

Creating a cluster is simple with the [`pgo create cluster`]({{< relref "pgo-client/reference/pgo_create_cluster.md" >}}) command:

```
pgo create cluster hippo
```

with output similar to:

```
created cluster: hippo
workflow id: 25c870a0-5d27-42c2-be00-92f0ba8768e7
database name: hippo
users:
username: testuser password: securerandomlygeneratedpassword
```

This creates a new PostgreSQL cluster named `hippo` with a database in it named `hippo`. This operation may take a few moments to complete. Note the name of the database user (`testuser`) and password (`securerandomlygeneratedpassword`) for when we connect to the PostgreSQL cluster.

To make it easier to copy and paste statements used throughout this tutorial, you can set the password of `testuser` as part of creating the PostgreSQL cluster:

```
pgo create cluster hippo --password=securerandomlygeneratedpassword
```

You can check on the status of the cluster creation using the [`pgo test`]({{< relref "pgo-client/reference/pgo_test.md" >}}) command. The `pgo test` command checks to see if the Kubernetes Services and the Pods that comprise the PostgreSQL cluster are available to receive connections. This includes:

- Testing that the Kubernetes Endpoints are available and able to route requests to healthy Pods.
- Testing that each PostgreSQL instance is available and ready to accept client connections by performing a connectivity check similar to the one performed by [`pg_isready`](https://www.postgresql.org/docs/current/app-pg-isready.html).

For example, when the `hippo` cluster is ready,

```
pgo test hippo
```

will yield output similar to:

```
cluster : hippo
Services
primary (10.96.179.126:5432): UP
Instances
primary (hippo-57675d4f8f-wwx64): UP
```


### The Create Cluster Process

So what just happened? Let's break down what occurs during the create cluster process.

1. First, `pgo` client creates an entry in the PostgreSQL Operator [pgcluster custom resource definition]({{< relref "custom-resources/_index.md" >}}) with the attributes desired to create the cluster. In the case above, this fills in the name of the cluster (`hippo`) and leverages a lot of defaults from the [PostgreSQL Operator configuration]({{< relref "configuration/pgo-yaml-configuration.md" >}}). We'll discuss more about the PostgreSQL Operator configuration later in the tutorial.

2. Once the custom resource is added, the PostgreSQL Operator begins provisioning the PostgreSQL instace and a pgBackRest repository which is used to store backups. The following actions occur as part of this process:

- Creating [persistent volume claims](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) (PVCs) for the PostgreSQL instance and the pgBackRest repository.
- Creating [services](https://kubernetes.io/docs/concepts/services-networking/service/) that provide a stable network interface for connecting to the PostgreSQL instance and pgBackRest repository.
- Creating [deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) that house each PostgreSQL instance and pgBackRest repository. Each of these is responsible for one Pod.
- The PostgreSQL Pod, when it is started, provisions a PostgreSQL database and performs other bootstrapping functions, such as creating `testuer`.
- The pgBackRest Pod, when it is started, initializes a pgBackRest repository. Note that the pgBackRest repository is not yet ready to start taking backups, but will be after the next step!

3. When the PostgreSQL Operator detects that the PostgreSQL and pgBackRest deployments are up and running, it creates a Kubenretes Job to create a pgBackRest stanza. This is necessary as part of intializing the pgBackRest repository to accept backups from our PostgreSQL cluster.

4. When the PostgreSQL Operator detects that the stanza creation is completed, it will take an initial backup of the cluster.

In order for a PostgreSQL cluster to be considered successfully created, all of these steps need to succeed. You can connect to the PostgreSQL cluster after step two completes, but note for the cluster to be considered "healthy", you need for pgBackRest to finish initializig.

You may ask yourself, "wait, why do I need for the pgBackRest repository to be initialized for a cluster to be successfully created?" That is a good question! The reason is that pgBackRest plays a fundamental role in both the [disaster recovery]({{< relref "architecture/disaster-recovery.md" >}}) AND [high availability]({{< relref "architecture/high-availability/_index.md" >}}) system with the PostgreSQL Operator, particularly around self-healing.

### What Is Created?

There are several Kubernetes objects that are created as part of the `pgo create cluster` command, including:

- A Deployment representing the primary PostgreSQL instance
- A PVC that persists the data of this instance
- A Service that can connect to this instance
- A Deployment representing the pgBackRest repository
- A PVC that persists the data of this repository
- A Service that can connect to this repository
- [Secrets](https://kubernetes.io/docs/concepts/configuration/secret/) representing the following three user accounts:
- `postgres`: the database superuser for the PostgreSQL cluster. This is in a secret called `hippo-postgres-secret`.
- `primaryuser`: the replication user. This is used for copying data between PostgreSQL instance. You should not need to login as this user. This is in a secret called `hippo-primaryuser-secret`.
- `testuser`: the regular user account. This user has access to log into the `hippo` database that is created. This is the account you want to give out to your user / application. In a later section, we will see how we can change the default user that is created. This is in a secret called `hippo-testuser-secret`, where `testuser` can be substituted for the name of the user account.
- [ConfigMaps](https://kubernetes.io/docs/concepts/configuration/configmap/), including:
- `hippo-pgha-config`, which allows you to [customize the configuration of your PostgreSQL cluster]({{< relref "advanced/custom-configuration.md">}}). We will cover more about this topic in later sections.
- `hippo-config` and `hippo-leader`, which are used by the high availability system. You should not modify these ConfigMaps.

Each deployment contains a single Pod. **Do not scale the deployments!**: further into the tutorial, we will cover some commands that let you scale your PostgreSQL cluster.

Some Job artifacts may be left around after the cluster creation process completes, including the stanza creation job (`hippo-stanza-create`) and initial backup job (`backrest-backup-hippo`). If the jobs completed successfully, you can safely delete these objects.

## Troubleshooting

### PostgreSQL / pgBackRest Pods Stuck in `Pending` Phase

The most common occurrence of this is due to PVCs not being bound. Ensure that you have configure your [storage options]({{< relref "installation/configuration.md" >}}#storage-settings) correctly for your Kubernetes environment, if for some reason you cannot use your default storage class or it is unavailable.

Also ensure that you have enough persistent volumes available: your Kubernetes administrator may need to provision more.

### `stanza-create` Job Never Finishes

The most common occurrence of this is due to the Kubernetes network blocking SSH connections between Pods. Ensure that your Kubernetes networking layer allows for SSH connections over port 2022 in the Namespace that you are deploying your PostgreSQL clusters into.

## Next Steps

Once your cluster is created, the next step is to [connect to your PostgreSQL cluster]({{< relref "tutorial/connect-cluster.md" >}}).
Loading

0 comments on commit 42fe8d5

Please sign in to comment.