Docker & Kubernetes - MongoDB with StatefulSets on GCP Kubernetes Engine

bogotobogo.com site search:

Introduction

In this post, we'll be creating a MongoDB replica set with Kubernetes StatefulSets, connecting to the MongoDB replica set, and then do scaling the replica set.

Google Cloud Shell

Google Cloud Shell is loaded with development tools and it offers a persistent 5GB home directory and runs on the Google Cloud. Google Cloud Shell provides command-line access to our GCP resources. We can activate the shell: in GCP console, on the top right toolbar, click the Open Cloud Shell button:

In the dialog box that opens, click "START CLOUD SHELL".

gcloud is the command-line tool for Google Cloud Platform. It comes pre-installed on Cloud Shell and supports tab-completion.

Set our zone:

$ gcloud config set compute/zone us-central1-f
Updated property [compute/zone].

Run the following command to create a Kubernetes cluster:

$ gcloud container clusters create hello-world
...
kubeconfig entry generated for hello-world.
NAME         LOCATION       MASTER_VERSION  MASTER_IP      MACHINE_TYPE   NODE_VERSION  NUM_NODES  STATUS
hello-world  us-central1-f  1.11.6-gke.2    35.222.37.132  n1-standard-1  1.11.6-gke.2  3          RUNNING

Setup MongoDB Replica Set and Instantiate a StatefulSet

Now that we have our Kubernetes cluster, let's set up MongoDB.

We will be using a replica set so that our data is highly available and redundant. To get that set up, we need to do the following:

Download the MongoDB replica set/sidecar.
Instantiate a StorageClass.
Instantiate a headless service.
Instantiate a StatefulSet.

Run the following command to clone the MongoDB/Kubernetes replica set from the Github repository:

$ git clone https://github.com/thesandlord/mongo-k8s-sidecar.git
Cloning into 'mongo-k8s-sidecar'...
remote: Enumerating objects: 306, done.
remote: Total 306 (delta 0), reused 0 (delta 0), pack-reused 306
Receiving objects: 100% (306/306), 328.29 KiB | 0 bytes/s, done.
Resolving deltas: 100% (155/155), done.

Create the StorageClass

Navigate to the StatefulSet directory and then later we'll create a Kubernetes StorageClass which tells Kubernetes what kind of storage we want to use for database nodes.

$ cd ./mongo-k8s-sidecar/example/StatefulSet/

$ ls
azure_hdd.yaml  azure_ssd.yaml  googlecloud_hdd.yaml  googlecloud_ssd.yaml  mongo-statefulset.yaml  README.md

On the Google Cloud Platform, we have a couple of storage choices: SSDs and hard disks.

Let's take a look at the googlecloud_ssd.yaml file:

kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
  name: fast
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd

The configuration creates a new StorageClass called "fast" that is backed by SSD volumes. Run the following command to deploy the StorageClass:

$ kubectl apply -f googlecloud_ssd.yaml
storageclass.storage.k8s.io "fast" created

Now that our StorageClass is configured, our StatefulSet can now request a volume that will automatically be created.

Deploying the Headless Service and StatefulSet

Let's open up the configuration file (mongo-statefulset.yaml) which houses Headless service and StatefulSets.

apiVersion: v1   <-----------   Headless Service configuration
kind: Service
metadata:
  name: mongo
  labels:
    name: mongo
spec:
  ports:
  - port: 27017
    targetPort: 27017
  clusterIP: None
  selector:
    role: mongo
---
apiVersion: apps/v1beta1    <------- StatefulSet configuration
kind: StatefulSet
metadata:
  name: mongo
spec:
  serviceName: "mongo"
  replicas: 3
  template:
    metadata:
      labels:
        role: mongo
        environment: test
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: mongo
          image: mongo
          command:
            - mongod
            - "--replSet"
            - rs0
            - "--smallfiles"
            - "--noprealloc"
          ports:
            - containerPort: 27017
          volumeMounts:
            - name: mongo-persistent-storage
              mountPath: /data/db
        - name: mongo-sidecar
          image: cvallance/mongo-k8s-sidecar
          env:
            - name: MONGO_SIDECAR_POD_LABELS
              value: "role=mongo,environment=test"
  volumeClaimTemplates:
  - metadata:
      name: mongo-persistent-storage
      annotations:
        volume.beta.kubernetes.io/storage-class: "fast"
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 100Gi

Headless service:

The first section of mongo-statefulset.yaml refers to a headless service. In Kubernetes terms, a service describes policies or rules for accessing specific pods. In brief, a headless service is one that doesn't prescribe load balancing. When combined with StatefulSets, this will give us individual DNSs to access our pods, and in turn, a way to connect to all of our MongoDB nodes individually. In the yaml file, we can make sure that the service is headless by verifying that the clusterIP field is set to None.

StatefulSet:

The StatefulSet configuration is the second section of mongo-statefulset.yaml. This is the bread and butter of the application: it's the workload that runs MongoDB and what orchestrates our Kubernetes resources. Referencing the yaml file, we see that the first section describes the StatefulSet object. Then, we move into the Metadata section, where labels and the number of replicas are specified.

Next comes the pod spec. The terminationGracePeriodSeconds is used to gracefully shutdown the pod when we scale down the number of replicas. Then the configurations for the two containers are shown. The first one runs MongoDB with command line flags that configure the replica set name. It also mounts the persistent storage volume to /data/db: the location where MongoDB saves its data. The second container runs the sidecar. This sidecar container will configure the MongoDB replica set automatically. As mentioned earlier, a "sidecar" is a helper container that helps the main container run its jobs and tasks.

Finally, there is the volumeClaimTemplates. This is what talks to the StorageClass we created before to provision the volume. It provisions a 100 GB disk for each MongoDB replica.

Now that we have a basic understanding of what a headless service and StatefulSet are, let's go ahead and deploy them. Since the two are packaged in one mongo-statefulset.yaml, we can run the following command to run both of them at one shot:

$ kubectl apply -f mongo-statefulset.yaml
service "mongo" created
statefulset.apps "mongo" created

Connecting to the MongoDB Replica Set

Now that we have a cluster running and our replica set deployed, it's time to connect to it.

Kubernetes StatefulSets deploys each pod sequentially. It waits for the MongoDB replica set member to fully boot up and create the backing disk before starting the next member. Run the following command to view and confirm that all three members are up:

$ kubectl get statefulset
NAME      DESIRED   CURRENT   AGE
mongo     3         3         2m

At this point, we should have three pods created in our cluster. These correspond to the three nodes in our MongoDB replica set. To view them:

$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
mongo-0   2/2       Running   0          3m
mongo-1   2/2       Running   0          2m
mongo-2   2/2       Running   0          2m

Wait for all three members to be created before moving on. Connect to the first replica set member:

$ kubectl exec -ti mongo-0 mongo
Defaulting container name to mongo.
Use 'kubectl describe pod/mongo-0 -n default' to see all of the containers in this pod.
MongoDB shell version v4.0.6
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("1c19e64b-0e5f-478e-b22c-aa06365a93c4") }
MongoDB server version: 4.0.6
Welcome to the MongoDB shell.
...
---
Enable MongoDB's free cloud-based monitoring service, which will then receive and display
metrics about your deployment (disk utilization, CPU, operation statistics, etc).

The monitoring data will be available on a MongoDB website with a unique URL accessible to you
and anyone you share the URL with. MongoDB may use this information to make product
improvements and to suggest MongoDB products and deployment options to you.

To enable free monitoring, run the following command: db.enableFreeMonitoring()
To permanently disable this reminder, run the following command: db.disableFreeMonitoring()
---

>

We now have a REPL environment connected to the MongoDB. Let's instantiate the replica set with a default configuration by running the rs.initiate() command:

> rs.initiate()
{
        "info2" : "no configuration specified. Using a default configuration for the set",
        "me" : "localhost:27017",
        "ok" : 1,
        "operationTime" : Timestamp(1550555064, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1550555064, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
rs0:OTHER>

Print the replica set configuration; run the rs.conf() command.

This outputs the details for the current member of replica set rs0. In this post, we see only one member. To get details of all members we need to expose the replica set through additional services like NodePort or Load Balancer.

rs0:PRIMARY> rs.conf()
{
        "_id" : "rs0",
        "version" : 1,
        "protocolVersion" : NumberLong(1),
        "writeConcernMajorityJournalDefault" : true,
        "members" : [
                {
                        "_id" : 0,
                        "host" : "localhost:27017",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {
                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                }
        ],
        "settings" : {
                "chainingAllowed" : true,
                "heartbeatIntervalMillis" : 2000,
                "heartbeatTimeoutSecs" : 10,
                "electionTimeoutMillis" : 10000,
                "catchUpTimeoutMillis" : -1,
                "catchUpTakeoverDelayMillis" : 30000,
                "getLastErrorModes" : {
                },
                "getLastErrorDefaults" : {
                        "w" : 1,
                        "wtimeout" : 0
                },
                "replicaSetId" : ObjectId("5c6b97b8e729cff1da837701")
        }
}
rs0:PRIMARY>
rs0:PRIMARY> exit
bye

Scaling the MongoDB replica set

A big advantage of Kubernetes and StatefulSets is that we can scale the number of MongoDB Replicas up and down with a single command. To scale up the number of replica set members from 3 to 5, run this command:

$ kubectl scale --replicas=5 statefulset mongo
statefulset.apps "mongo" scaled

In a few minutes, there will be 5 MongoDB pods. Run this command to view them:

$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
mongo-0   2/2       Running   0          16m
mongo-1   2/2       Running   0          16m
mongo-2   2/2       Running   0          15m
mongo-3   2/2       Running   0          1m
mongo-4   2/2       Running   0          34s

To scale down the number of replica set members from 5 back to 3, run this command:

$ kubectl scale --replicas=3 statefulset mongo
statefulset.apps "mongo" scaled

$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
mongo-0   2/2       Running   0          18m
mongo-1   2/2       Running   0          17m
mongo-2   2/2       Running   0          16m

Using the MongoDB replica set

Each pod in a StatefulSet backed by a Headless Service will have a stable DNS name. The template follows this format: <pod-name>.<service-name>

This means the DNS names for the MongoDB replica set are:

mongo-0.mongo
mongo-1.mongo
mongo-2.mongo

We can use these names directly in the connection string URI of our app.

Using a database is outside the scope of this post, however for this case, the connection string URI would be:

"mongodb://mongo-0.mongo,mongo-1.mongo,mongo-2.mongo:27017/dbname_?"

Clean up

$ kubectl delete statefulset mongo
statefulset.apps "mongo" deleted

$ kubectl delete svc mongo
service "mongo" deleted

$ kubectl delete pvc -l role=mongo
persistentvolumeclaim "mongo-persistent-storage-mongo-0" deleted
persistentvolumeclaim "mongo-persistent-storage-mongo-1" deleted
persistentvolumeclaim "mongo-persistent-storage-mongo-2" deleted
persistentvolumeclaim "mongo-persistent-storage-mongo-3" deleted
persistentvolumeclaim "mongo-persistent-storage-mongo-4" deleted

$ gcloud container clusters delete "hello-world"
The following clusters will be deleted.
 - [hello-world] in [us-central1-f]
Do you want to continue (Y/n)?  y
Deleting cluster hello-world...done.
Deleted [https://container.googleapis.com/v1/projects/qwiklabs-gcp-af2d261ece8f1f40/zones/us-central1-f/clusters/hello-world].

Reference: Running a MongoDB Database in Kubernetes with StatefulSets