Docker & Kubernetes - MongoDB with StatefulSets on GCP Kubernetes Engine

In this post, we'll be creating a MongoDB replica set with Kubernetes StatefulSets, connecting to the MongoDB replica set, and then do scaling the replica set.

Google Cloud Shell is loaded with development tools and it offers a persistent 5GB home directory and runs on the Google Cloud. Google Cloud Shell provides command-line access to our GCP resources. We can activate the shell: in GCP console, on the top right toolbar, click the Open Cloud Shell button:

In the dialog box that opens, click "START CLOUD SHELL".
gcloud is the command-line tool for Google Cloud Platform. It comes pre-installed on Cloud Shell and supports tab-completion.
Set our zone:
$ gcloud config set compute/zone us-central1-f Updated property [compute/zone].
Run the following command to create a Kubernetes cluster:
$ gcloud container clusters create hello-world ... kubeconfig entry generated for hello-world. NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS hello-world us-central1-f 1.11.6-gke.2 n1-standard-1 1.11.6-gke.2 3 RUNNING
Now that we have our Kubernetes cluster, let's set up MongoDB.
We will be using a replica set so that our data is highly available and redundant. To get that set up, we need to do the following:
- Download the MongoDB replica set/sidecar.
- Instantiate a StorageClass.
- Instantiate a headless service.
- Instantiate a StatefulSet.
Run the following command to clone the MongoDB/Kubernetes replica set from the Github repository:
$ git clone Cloning into 'mongo-k8s-sidecar'... remote: Enumerating objects: 306, done. remote: Total 306 (delta 0), reused 0 (delta 0), pack-reused 306 Receiving objects: 100% (306/306), 328.29 KiB | 0 bytes/s, done. Resolving deltas: 100% (155/155), done.
Navigate to the StatefulSet directory and then later we'll create a Kubernetes StorageClass which tells Kubernetes what kind of storage we want to use for database nodes.
$ cd ./mongo-k8s-sidecar/example/StatefulSet/ $ ls azure_hdd.yaml azure_ssd.yaml googlecloud_hdd.yaml googlecloud_ssd.yaml mongo-statefulset.yaml
On the Google Cloud Platform, we have a couple of storage choices: SSDs and hard disks.
Let's take a look at the googlecloud_ssd.yaml file:
kind: StorageClass apiVersion: metadata: name: fast provisioner: parameters: type: pd-ssd
The configuration creates a new StorageClass called "fast" that is backed by SSD volumes. Run the following command to deploy the StorageClass:
$ kubectl apply -f googlecloud_ssd.yaml "fast" created
Now that our StorageClass is configured, our StatefulSet can now request a volume that will automatically be created.
Let's open up the configuration file (mongo-statefulset.yaml) which houses Headless service and StatefulSets.
apiVersion: v1 <----------- Headless Service configuration kind: Service metadata: name: mongo labels: name: mongo spec: ports: - port: 27017 targetPort: 27017 clusterIP: None selector: role: mongo --- apiVersion: apps/v1beta1 <------- StatefulSet configuration kind: StatefulSet metadata: name: mongo spec: serviceName: "mongo" replicas: 3 template: metadata: labels: role: mongo environment: test spec: terminationGracePeriodSeconds: 10 containers: - name: mongo image: mongo command: - mongod - "--replSet" - rs0 - "--smallfiles" - "--noprealloc" ports: - containerPort: 27017 volumeMounts: - name: mongo-persistent-storage mountPath: /data/db - name: mongo-sidecar image: cvallance/mongo-k8s-sidecar env: - name: MONGO_SIDECAR_POD_LABELS value: "role=mongo,environment=test" volumeClaimTemplates: - metadata: name: mongo-persistent-storage annotations: "fast" spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 100Gi
Headless service:
The first section of mongo-statefulset.yaml refers to a headless service. In Kubernetes terms, a service describes policies or rules for accessing specific pods. In brief, a headless service is one that doesn't prescribe load balancing. When combined with StatefulSets, this will give us individual DNSs to access our pods, and in turn, a way to connect to all of our MongoDB nodes individually. In the yaml file, we can make sure that the service is headless by verifying that the clusterIP field is set to None.

The StatefulSet configuration is the second section of mongo-statefulset.yaml. This is the bread and butter of the application: it's the workload that runs MongoDB and what orchestrates our Kubernetes resources. Referencing the yaml file, we see that the first section describes the StatefulSet object. Then, we move into the Metadata section, where labels and the number of replicas are specified.
Next comes the pod spec. The terminationGracePeriodSeconds is used to gracefully shutdown the pod when we scale down the number of replicas. Then the configurations for the two containers are shown. The first one runs MongoDB with command line flags that configure the replica set name. It also mounts the persistent storage volume to /data/db: the location where MongoDB saves its data. The second container runs the sidecar. This sidecar container will configure the MongoDB replica set automatically. As mentioned earlier, a "sidecar" is a helper container that helps the main container run its jobs and tasks.
Finally, there is the volumeClaimTemplates. This is what talks to the StorageClass we created before to provision the volume. It provisions a 100 GB disk for each MongoDB replica.
Now that we have a basic understanding of what a headless service and StatefulSet are, let's go ahead and deploy them. Since the two are packaged in one mongo-statefulset.yaml, we can run the following command to run both of them at one shot:
$ kubectl apply -f mongo-statefulset.yaml service "mongo" created statefulset.apps "mongo" created
Now that we have a cluster running and our replica set deployed, it's time to connect to it.
Kubernetes StatefulSets deploys each pod sequentially. It waits for the MongoDB replica set member to fully boot up and create the backing disk before starting the next member. Run the following command to view and confirm that all three members are up:
$ kubectl get statefulset NAME DESIRED CURRENT AGE mongo 3 3 2m
At this point, we should have three pods created in our cluster. These correspond to the three nodes in our MongoDB replica set. To view them:
$ kubectl get pods NAME READY STATUS RESTARTS AGE mongo-0 2/2 Running 0 3m mongo-1 2/2 Running 0 2m mongo-2 2/2 Running 0 2m
Wait for all three members to be created before moving on. Connect to the first replica set member:
$ kubectl exec -ti mongo-0 mongo Defaulting container name to mongo. Use 'kubectl describe pod/mongo-0 -n default' to see all of the containers in this pod. MongoDB shell version v4.0.6 connecting to: mongodb:// Implicit session: session { "id" : UUID("1c19e64b-0e5f-478e-b22c-aa06365a93c4") } MongoDB server version: 4.0.6 Welcome to the MongoDB shell. ... --- Enable MongoDB's free cloud-based monitoring service, which will then receive and display metrics about your deployment (disk utilization, CPU, operation statistics, etc). The monitoring data will be available on a MongoDB website with a unique URL accessible to you and anyone you share the URL with. MongoDB may use this information to make product improvements and to suggest MongoDB products and deployment options to you. To enable free monitoring, run the following command: db.enableFreeMonitoring() To permanently disable this reminder, run the following command: db.disableFreeMonitoring() --- >
We now have a REPL environment connected to the MongoDB. Let's instantiate the replica set with a default configuration by running the rs.initiate()
> rs.initiate() { "info2" : "no configuration specified. Using a default configuration for the set", "me" : "localhost:27017", "ok" : 1, "operationTime" : Timestamp(1550555064, 1), "$clusterTime" : { "clusterTime" : Timestamp(1550555064, 1), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } } rs0:OTHER>
Print the replica set configuration; run the rs.conf()
This outputs the details for the current member of replica set rs0. In this post, we see only one member. To get details of all members we need to expose the replica set through additional services like NodePort or Load Balancer.
rs0:PRIMARY> rs.conf() { "_id" : "rs0", "version" : 1, "protocolVersion" : NumberLong(1), "writeConcernMajorityJournalDefault" : true, "members" : [ { "_id" : 0, "host" : "localhost:27017", "arbiterOnly" : false, "buildIndexes" : true, "hidden" : false, "priority" : 1, "tags" : { }, "slaveDelay" : NumberLong(0), "votes" : 1 } ], "settings" : { "chainingAllowed" : true, "heartbeatIntervalMillis" : 2000, "heartbeatTimeoutSecs" : 10, "electionTimeoutMillis" : 10000, "catchUpTimeoutMillis" : -1, "catchUpTakeoverDelayMillis" : 30000, "getLastErrorModes" : { }, "getLastErrorDefaults" : { "w" : 1, "wtimeout" : 0 }, "replicaSetId" : ObjectId("5c6b97b8e729cff1da837701") } } rs0:PRIMARY> rs0:PRIMARY> exit bye
A big advantage of Kubernetes and StatefulSets is that we can scale the number of MongoDB Replicas up and down with a single command. To scale up the number of replica set members from 3 to 5, run this command:
$ kubectl scale --replicas=5 statefulset mongo statefulset.apps "mongo" scaled
In a few minutes, there will be 5 MongoDB pods. Run this command to view them:
$ kubectl get pods NAME READY STATUS RESTARTS AGE mongo-0 2/2 Running 0 16m mongo-1 2/2 Running 0 16m mongo-2 2/2 Running 0 15m mongo-3 2/2 Running 0 1m mongo-4 2/2 Running 0 34s
To scale down the number of replica set members from 5 back to 3, run this command:
$ kubectl scale --replicas=3 statefulset mongo statefulset.apps "mongo" scaled $ kubectl get pods NAME READY STATUS RESTARTS AGE mongo-0 2/2 Running 0 18m mongo-1 2/2 Running 0 17m mongo-2 2/2 Running 0 16m
Each pod in a StatefulSet backed by a Headless Service will have a stable DNS name. The template follows this format: <pod-name>.<service-name>
This means the DNS names for the MongoDB replica set are:
mongo-0.mongo mongo-1.mongo mongo-2.mongo
We can use these names directly in the connection string URI of our app.
Using a database is outside the scope of this post, however for this case, the connection string URI would be:
$ kubectl delete statefulset mongo statefulset.apps "mongo" deleted $ kubectl delete svc mongo service "mongo" deleted $ kubectl delete pvc -l role=mongo persistentvolumeclaim "mongo-persistent-storage-mongo-0" deleted persistentvolumeclaim "mongo-persistent-storage-mongo-1" deleted persistentvolumeclaim "mongo-persistent-storage-mongo-2" deleted persistentvolumeclaim "mongo-persistent-storage-mongo-3" deleted persistentvolumeclaim "mongo-persistent-storage-mongo-4" deleted $ gcloud container clusters delete "hello-world" The following clusters will be deleted. - [hello-world] in [us-central1-f] Do you want to continue (Y/n)? y Deleting cluster hello-world...done. Deleted [].
Reference: Running a MongoDB Database in Kubernetes with StatefulSets
