Running ownCloud in Kubernetes with Rook Ceph Storage - Part 2

How to run ownCloud in Kubernetes with using Rook for a Ceph Cluster.

This a cross post of a post I wrote for the ownCloud Blog, the original post can be found here: Running ownCloud in Kubernetes With Rook Ceph Storage – Step by Step.

Thanks to them for allowing me to write and publish the post on their blog!


Preparations

Let's prepare for the Kubernetes madness!

Kubernetes Cluster Access

As written in the first part, it is expected that you have (admin) access to a Kubernetes cluster already.

If you don't have a Kubernetes cluster, you can try using the following projects xetys/hetzner-kube on GitHub, Kubespray and others (Kubernetes documentation).

minikube is not enough when started with the default resources, be sure to give minikube extra resources otherwise you will run into problems! Be sure to add the following flags to the minikube start command: --memory=4096 --cpus=3 --disk-size=40g.

You should have cluster-admin access to the Kubernetes cluster! Other access can also work, but due to the nature of objects that are created along the way it is easier to have the cluster-admin access.

Kubernetes Cluster

Ingress Controller

WARNING Only follow this section, if your Kubernetes cluster does not have an Ingress controller yet.

We are going to install the Kubernetes NGINX Ingress Controller.

# Taken from https://github.com/kubernetes/ingress-nginx/blob/master/deploy/static/mandatory.yaml
kubectl apply -f ingress-nginx/

The instructions shown here are for an environment without LoadBalancer Service type support (e.g., bare metal, "normal" VM provider, not cloud), for installation instructions for other environments checkout Installation Guide - NGINX Ingress Controller.

# Taken from # Taken from https://github.com/kubernetes/ingress-nginx/blob/master/deploy/static/provider/baremetal/service-nodeport.yaml
kubectl apply -f ingress-nginx/service-nodeport.yaml

As these are bare metal installation instructions, the NGINX Ingress controller will be available through a Service of type NodePort. This Service type exposes one or more ports on all Nodes in the Kubernetes cluster.

To get that port run:

$ kubectl get -n ingress-nginx service ingress-nginx
NAME            TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
ingress-nginx   NodePort   10.108.254.160   <none>        80:30512/TCP,443:30243/TCP   23m

In that output you can see the NodePorts for HTTP and HTTPS on which you can connect to the NGINX Ingress controller and ownCloud later.

Though as written you probably want to look into a more "solid" way to expose the NGINX Ingress controller(s), for bare metal where there is no Kubernetes LoadBalancer integration one can consider using hostNetwork option for that: Bare-metal considerations - NGINX Ingress Controller.

Namespaces

Through the whole installation we will create 4 Namespaces:

  • rook-ceph - For the Rook run Ceph cluster + the Rook Ceph operator (will be created in Rook Ceph storage).
  • owncloud - For ownCloud and the other operators, such as Zalando's Postgres Operator and KubeDB for Redis.
  • ingress-nginx - If you don't have an Ingress controller running yet, the namespace is used for the Ingress NGINX controller (it is already created in the Ingress Controller steps).
kubectl create -f namespaces.yaml

Rook Ceph Storage

Now on to running Ceph in Kubernetes, using the Rook.io project.

In the following sections make sure to use the available -test suffixed files if you have less than 3 Nodes which are available to any application/ Pod (e.g., depending on your cluster the masters are not available for Pods). (You can change that, for that be sure to dig into the CephCluster object's spec.placement.tolerations and the Operator environment variables for the discover and agent daemons. Running application Pods on the masters is not recommended though)

Operator

The operator will take care of starting up the Ceph components one by one and also preparing of disks and health checking.

kubectl create -f rook-ceph/common.yaml
kubectl create -f rook-ceph/operator.yaml

You can check on the Pods to see how it looks:

$ kubectl get -n rook-ceph pod
NAME                                  READY   STATUS    RESTARTS   AGE
rook-ceph-agent-cbrgv                 1/1     Running   0          90s
rook-ceph-agent-wfznr                 1/1     Running   0          90s
rook-ceph-agent-zhgg7                 1/1     Running   0          90s
rook-ceph-operator-6897f5c696-j724m   1/1     Running   0          2m18s
rook-discover-jg798                   1/1     Running   0          90s
rook-discover-kfxc8                   1/1     Running   0          90s
rook-discover-qbhfs                   1/1     Running   0          90s

The rook-discover-* Pods are each one on each Node of your Kubernetes cluster, as they are discovering the disks of the Nodes so the operator can plan the actions for a given CephCluster object which comes up next.

Ceph Cluster

This is the definition of Ceph cluster that will be created in Kubernetes. It contains the lists and options on which disks to use and on which Nodes.

If you wanna see some example CephCluster objects to see what is possible, be sure to checkout Rook v1.0 Documentation - CephCluster CRD.

INFO Use the cluster-test.yaml when your Kubernetes cluster has less than 3 schedulable Nodes (e.g., minikube)! When using the cluster-test.yaml only one mon is started. If that mon is down for whatever reason, the Ceph Cluster will come to a halt to prevent any data "corruption".

$ kubectl create -f rook-ceph/cluster.yaml

This will now cause the operator to start the Ceph cluster after the specifications in the CephCluster object.

To see which Pods have already been created by the operator, you can run (output example from a three node cluster):

$ kubectl get -n rook-ceph pod
NAME                                                     READY   STATUS      RESTARTS   AGE
rook-ceph-agent-cbrgv                                    1/1     Running     0          11m
rook-ceph-agent-wfznr                                    1/1     Running     0          11m
rook-ceph-agent-zhgg7                                    1/1     Running     0          11m
rook-ceph-mgr-a-77fc54c489-66mpd                         1/1     Running     0          6m45s
rook-ceph-mon-a-68b94cd66-m48lm                          1/1     Running     0          8m6s
rook-ceph-mon-b-7b679476f-mc7wj                          1/1     Running     0          8m
rook-ceph-mon-c-b5c468c94-f8knt                          1/1     Running     0          7m54s
rook-ceph-operator-6897f5c696-j724m                      1/1     Running     0          11m
rook-ceph-osd-0-5c8d8fcdd-m4gl7                          1/1     Running     0          5m55s
rook-ceph-osd-1-67bfb7d647-vzmpv                         1/1     Running     0          5m56s
rook-ceph-osd-2-c8c55548f-ws8sl                          1/1     Running     0          5m11s
rook-ceph-osd-prepare-owncloudrookceph-worker-01-svvz9   0/2     Completed   0          6m7s
rook-ceph-osd-prepare-owncloudrookceph-worker-02-mhvf2   0/2     Completed   0          6m7s
rook-ceph-osd-prepare-owncloudrookceph-worker-03-nt2gs   0/2     Completed   0          6m7s
rook-discover-jg798                                      1/1     Running     0          11m
rook-discover-kfxc8                                      1/1     Running     0          11m
rook-discover-qbhfs                                      1/1     Running     0          11m

Block storage (RBD)

Before creating the CephFS filesystem, let's create a block storage pool with a StorageClass. The StorageClass is for the PostgreSQL and if you want even the Redis cluster.

INFO Use the storageclass-test.yaml when your Kubernetes cluster has less than 3 schedulable Nodes!

kubectl create -f rook-ceph/storageclass.yaml

In case of a block storage Pool there are no additional Pods that will be started, we'll verify that the block storage Pool has been created in the Toolbox section.

One more thing to do is, to set the created StorageClass as default in the Kubernetes cluster by running the following command:

kubectl patch storageclass rook-ceph-block -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Now you are ready to move onto the storage for the actual data to be stored in ownCloud!

CephFS

CephFS is the filesystem that Ceph offers, with its POSIX compliance it is a perfect fit to be used with ownCloud.

INFO Use the filesystem-test.yaml when your Kubernetes cluster has less than 3 schedulable Nodes!

kubectl create -f rook-ceph/filesystem.yaml

Creation of the CephFS will cause, so called MDS daemons, MDS Pods to be started.

$ kubectl get -n rook-ceph pod
NAME                                    READY   STATUS      RESTARTS   AGE
[...]
rook-ceph-mds-myfs-a-747b75bdc7-9nzwx                    1/1     Running     0          11s
rook-ceph-mds-myfs-b-76b9fcc8cc-md8bz                    1/1     Running     0          10s
[...]

Toolbox

This will create a Pod which will allow us to run Ceph commands. It will be use to quickly check the Ceph clusters status.

$ kubectl create -f rook-ceph/toolbox.yaml
# Wait for the Pod to be `Running`
$ kubectl get -n rook-ceph pod -l "app=rook-ceph-tools"
NAME                                    READY   STATUS      RESTARTS   AGE
[...]
rook-ceph-tools-5966446d7b-nrw5n                         1/1     Running     0          10s
[...]

Now use kubectl exec to enter the Rook Ceph Toolbox Pod:

kubectl exec -n rook-ceph -it $(kubectl get -n rook-ceph pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

In the Rook Ceph Toolbox Pod, run the following command to get the Ceph cluster health status (example output from a 7 Node Kubernetes Rook Ceph cluster):

$ ceph -s
 cluster:
   id:     f8492cd9-3d14-432c-b681-6f73425d6851
   health: HEALTH_OK

 services:
   mon: 3 daemons, quorum c,b,a
   mgr: a(active)
   mds: repl-2-1-2/2/2 up  {0=repl-2-1-c=up:active,1=repl-2-1-b=up:active}, 2 up:standby-replay
   osd: 7 osds: 7 up, 7 in

 data:
   pools:   3 pools, 300 pgs
   objects: 1.41 M objects, 4.0 TiB
   usage:   8.2 TiB used, 17 TiB/ 25 TiB avail
   pgs:     300 active+clean

 io:
   client:   6.2 KiB/s rd, 1.5 MiB/s wr, 4 op/s rd, 140 op/s wr

You can also get it by using kubectl:

$ kubectl get -n rook-ceph cephcluster rook-ceph
NAME        DATADIRHOSTPATH   MONCOUNT   AGE   STATE     HEALTH
rook-ceph   /mnt/sda1/rook    3          14m   Created   HEALTH_OK

That even shows you some additional information directly through kubectl instead of having to read the ceph -s output.

Summary

This is how it should look Pod wise now in your rook-ceph Namespace (example output from a 3 Node Kubernetes cluster):

$ kubectl get -n rook-ceph pod
NAME                                                     READY   STATUS      RESTARTS   AGE
rook-ceph-agent-cbrgv                                    1/1     Running     0          15m
rook-ceph-agent-wfznr                                    1/1     Running     0          15m
rook-ceph-agent-zhgg7                                    1/1     Running     0          15m
rook-ceph-mds-myfs-a-747b75bdc7-9nzwx                    1/1     Running     0          42s
rook-ceph-mds-myfs-b-76b9fcc8cc-md8bz                    1/1     Running     0          41s
rook-ceph-mgr-a-77fc54c489-66mpd                         1/1     Running     0          11m
rook-ceph-mon-a-68b94cd66-m48lm                          1/1     Running     0          12m
rook-ceph-mon-b-7b679476f-mc7wj                          1/1     Running     0          2m22s
rook-ceph-mon-c-b5c468c94-f8knt                          1/1     Running     0          2m6s
rook-ceph-operator-6897f5c696-j724m                      1/1     Running     0          16m
rook-ceph-osd-0-5c8d8fcdd-m4gl7                          1/1     Running     0          10m
rook-ceph-osd-1-67bfb7d647-vzmpv                         1/1     Running     0          10m
rook-ceph-osd-2-c8c55548f-ws8sl                          1/1     Running     0          9m48s
rook-ceph-osd-prepare-owncloudrookceph-worker-01-5xpqk   0/2     Completed   0          73s
rook-ceph-osd-prepare-owncloudrookceph-worker-02-xnl8p   0/2     Completed   0          70s
rook-ceph-osd-prepare-owncloudrookceph-worker-03-2qggs   0/2     Completed   0          68s
rook-ceph-tools-5966446d7b-nrw5n                         1/1     Running     0          8s
rook-discover-jg798                                      1/1     Running     0          15m
rook-discover-kfxc8                                      1/1     Running     0          15m
rook-discover-qbhfs                                      1/1     Running     0          15m

The important thing is that the ceph -s output or the kubectl get cephcluster output shows that the health is HEALTH_OK and that you have OSD Pods running (ceph -s output line: osd: 3 osds: 3 up, 3 in (where 3 is basically the amount of OSD Pods).

Should you not have any OSD Pod, make sure all your Nodes are Ready and schedulable (e.g., no taints preventing "normal" Pods to run) and make sure to checkout the logs of the rook-ceph-osd-prepare-* and if existing rook-ceph-osd-[0-9]* Pods. If you don't have any Pods related to rook-ceph-osd-* look into the rook-ceph-operator-* logs for error messages, be sure to go over each line to make sure you don't miss an error message.

PostgreSQL

Moving on to the PostgreSQL for ownCloud.

Zalando's PostgreSQL operator does a great job for running PostgreSQL in Kubernetes.

First thing to create is the PostgreSQL Operator which brings the CustomResourceDefinitions, remember the custom Kubernetes objects, with itself. Using the Ceph block storage (RBD) we are going to create a redundant PostgreSQL instance for ownCloud to use.

$ kubectl create -n owncloud -f postgres/postgres-operator.yaml
# Check for the PostgreSQL operator Pod to be created and running
$ kubectl get -n owncloud pod
NAME                                 READY   STATUS    RESTARTS   AGE
postgres-operator-6464fc9c48-6twrd   1/1     Running   0          5m23s

That is the operator created, moving on to the PostgreSQL custom resource object that will cause the operator to create a PostgreSQL instance for use in Kubernetes:

# Make sure the CustomResourceDefinition of the PostgreSQL has been created
$ kubectl get customresourcedefinitions.apiextensions.k8s.io postgresqls.acid.zalan.do
NAME                        CREATED AT
postgresqls.acid.zalan.do   2019-08-04T10:27:59Z

The CustomResourceDefinition exists? Perfect, continue with the creation:

kubectl create -n owncloud -f postgres/postgres.yaml

It will take a bit for the two PostgreSQL Pods to appear, but in the end you should have two owncloud-postgres Pods:

$ kubectl get -n owncloud pod
NAME                                 READY   STATUS    RESTARTS   AGE
owncloud-postgres-0                  1/1     Running   0          92s
owncloud-postgres-1                  1/1     Running   0          64s
postgres-operator-6464fc9c48-6twrd   1/1     Running   0          7m

owncloud-postgres-0 and owncloud-postgres-1 in Running status? That looks good.

Now that the database is running, let's continue to the Redis.

Redis

To run a Redis cluster we need the KubeDB Operator, installing it can done using a bash script or Helm.

To keep it quick'n'easy we'll use their bash script for that:

curl -fsSL https://raw.githubusercontent.com/kubedb/cli/0.12.0/hack/deploy/kubedb.sh -o kubedb.sh
# Take a look at the script using, e.g., `cat kubedb.sh`
#
# If you are fine with it, run it:
chmod +x kubedb.sh
./kubedb.sh
# It will install the KubeDB operator to the cluster in the `kube-system` Namespace

(You can remove the script afterwards: rm kubedb.sh)

For more information on the bash script and/ or the Helm installation, checkout KubeDB.

Moving on to creating the Redis cluster, run:

kubectl create -n owncloud -f redis.yaml

It will take a few seconds for the first Redis Pod(s) to be started, to check that it worked look for Pods with redis-owncloud- in their name:

$ kubectl get -n owncloud pods
NAME                                 READY   STATUS    RESTARTS   AGE
owncloud-postgres-0                  1/1     Running   0          6m41s
owncloud-postgres-1                  1/1     Running   0          6m13s
postgres-operator-6464fc9c48-6twrd   1/1     Running   0          12m
redis-owncloud-shard0-0              1/1     Running   0          49s
redis-owncloud-shard0-1              1/1     Running   0          40s
redis-owncloud-shard1-0              1/1     Running   0          29s
redis-owncloud-shard1-1              1/1     Running   0          19s
redis-owncloud-shard2-0              1/1     Running   0          14s
redis-owncloud-shard2-1              1/1     Running   0          10s

That is how it should like now.

ownCloud

Now the final "piece", ownCloud.

The folder owncloud/ contains all the manifests we need.

  • ConfigMap and Secret for basic configuration of the ownCloud.
  • Deployment to get ownCloud Pods running in Kubernetes.
  • Service and Ingress to expose ownCloud to the internet.
  • CronJob to run the ownCloud cron task execution (e.g., cleanup and others), instead of having the cron run per instance.

The ownCloud Deployment currently uses a custom built image (galexrt/owncloud-server:latest) which has a fix for a clustered Redis configuration issue (pull request has been opened https://github.com/owncloud-docker/base/pull/95).

kubectl create -n owncloud -f owncloud/
# Now we'll wait for ownCloud to have installed the database to then scale the ownCloud up to `2` (or more if you want)

The admin username is myowncloudadmin and can be changed in the owncloud/owncloud-configmap.yaml file. Be sure to restart both ownCloud Pods after changing values in the ConfigMaps and Secrets.

If you want to change the admin password, edit the owncloud/owncloud-secret.yaml file line OWNCLOUD_ADMIN_PASSWORD. The values in a Kubernetes Secret object are base64 encoded (e.g., echo -n YOUR_PASSWORD | base64 -w0)!

To know when your ownCloud is up'n'running check the logs, e.g.:

$ kubectl logs -n owncloud -f owncloud-856fcc4947-crscn
Creating volume folders...
Creating hook folders...
Waiting for PostgreSQL...
wait-for-it: waiting 180 seconds for owncloud-postgres:5432
wait-for-it: owncloud-postgres:5432 is available after 1 seconds
Removing custom folder...
Linking custom folder...
Removing config folder...
Linking config folder...
Writing config file...
Fixing base perms...
Fixing data perms...
Fixing hook perms...
Installing server database...
ownCloud was successfully installed
ownCloud is already latest version
Writing objectstore config...
Writing php config...
Updating htaccess config...
.htaccess has been updated
Writing apache config...
Enabling webcron background...
Set mode for background jobs to 'webcron'
Touching cron configs...
Starting cron daemon...
Starting apache daemon...
[Sun Aug 04 13:26:18.986407 2019] [mpm_prefork:notice] [pid 190] AH00163: Apache/2.4.29 (Ubuntu) configured -- resuming normal operations
[Sun Aug 04 13:26:18.986558 2019] [core:notice] [pid 190] AH00094: Command line: '/usr/sbin/apache2 -f /etc/apache2/apache2.conf -D FOREGROUND'

The Installing server database... will take some time depending on your network, storage and other factors.

After the [Sun Aug 04 13:26:18.986558 2019] [core:notice] [pid 190] AH00094: Command line: '/usr/sbin/apache2 -f /etc/apache2/apache2.conf -D FOREGROUND' you should be able to reach your ownCloud instance through the NodePort Service Port (on HTTP) or through the Ingress (default address owncloud.example.com). If you are using the Ingress from the example files, be sure to edit it to use a (sub-) domain pointing to the Ingress controllers in your Kubernetes cluster.

You now have a ownCloud instance running.

Further points

HTTPS

To further improve the experience of running ownCloud in Kubernetes, you will probably want to checkout Jetstack's cert-manager project on GitHub to get yourself Letsencrypt certificates for your Ingress controller. The cert-manager allows you to request Let's Encrypt certificates easily through Kubernetes custom objects and keep them uptodate.

Meaning the ownCloud will then be reachable via HTTPS which combined with the ownCloud encryption makes it pretty secure.

For more information on using TLS with Kubernetes Ingress, checkout Ingress - Kubernetes.

Pod Health Checks

In the owncloud/owncloud-deployment.yaml there is a readinessProbe and livenessProbe in the Deployment sepc but commented out. After the ownCloud has been installed and you have verified it is running, you can go ahead and uncomment those lines and use kubectl apply/ kubectl replace (don't forget to specify the Namespace -n owncloud).

Upload Filesize

When changing the upload filesize on the ownCloud instance itself through the environment variables, be sure to also update the Ingress controller with the "max upload file size".

Other configuration options

When wanting to change config options, you need to provide them through environment variables. The environment variables are given to the ownCloud Deployment in the owncloud/owncloud-configmap.yaml.

A list of all available environment variables can be found here:

Updating ownCloud in Kubernetes

It is the same procedure as with running ownCloud with, e.g., docker-compose.

To update ownCloud you need to scale down the Deployment to 1 (replicas), then update the image, wait for the one single Pod come up again and then scale up the ownCloud Deployment again to, e.g., 2 or more.

Summary

This is the end of the two part series on running ownCloud in Kubernetes.

Have Fun!