Docker & Kubernetes : nodeSelector, nodeAffinity, taints/tolerations, pod affinity and anti-affinity - Assigning Pods to Nodes
Here are the basic concepts:
- nodeSelector:
To deploy a pod to a specific node - affinity set - taints and tolerations:
We apply a taint to tells a scheduler to repel Pods from a node if it does not match the taint. Only those Pods that have a toleration for the taint can be deployed into the node with that taint. - This combination is an anti-affinity set.
Note that a pod with a toleration is not guaranteed to be deployed to a node with taints.
With taints/tolerations we can create nodes that are reserved (dedicated) for specific pods. For example, pods which require that most of the resources of the node be available to them in order to operate flawlessly should be scheduled to nodes that are reserved for them.
Let's start our minikube with 3 nodes:
$ kubectl version Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:58:59Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0", GitCommit:"e19964183377d0ec2052d1f1fa930c4d7575bd50", GitTreeState:"clean", BuildDate:"2020-08-26T14:23:04Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"} $ minikube start --nodes 3 minikube v1.13.0 on Darwin 10.13.3 KUBECONFIG=/Users/kihyuckhong/.kube/config Automatically selected the docker driver. Other choices: hyperkit, virtualbox Starting control plane node minikube in cluster minikube Pulling base image ... Creating docker container (CPUs=2, Memory=2200MB) ... Preparing Kubernetes v1.19.0 on Docker 19.03.8 ... Verifying Kubernetes components... Enabled addons: default-storageclass, storage-provisioner Multi-node clusters are currently experimental and might exhibit unintended behavior. To track progress on multi-node clusters, see https://github.com/kubernetes/minikube/issues/7538. Starting node minikube-m02 in cluster minikube docker "minikube-m02" container is missing, will recreate. Creating docker container (CPUs=2, Memory=2200MB) ... Found network options: NO_PROXY=172.17.0.3 Preparing Kubernetes v1.19.0 on Docker 19.03.8 ... env NO_PROXY=172.17.0.3 Verifying Kubernetes components... Starting node minikube-m03 in cluster minikube docker "minikube-m03" container is missing, will recreate. Creating docker container (CPUs=2, Memory=2200MB) ... Found network options: NO_PROXY=172.17.0.3,172.17.0.4 Preparing Kubernetes v1.19.0 on Docker 19.03.8 ... env NO_PROXY=172.17.0.3 env NO_PROXY=172.17.0.3,172.17.0.4 Verifying Kubernetes components... Done! kubectl is now configured to use "minikube" by default
Check the nodes:
$ kubectl get nodes NAME STATUS ROLES AGE VERSION minikube Ready master 6m41s v1.19.0 minikube-m02 Ready <none> 5m5s v1.19.0 minikube-m03 Ready <none> 3m38s v1.19.0
Note: as nodeAffinity encompasses what can be achieved with nodeSelectors, nodeSelectors will be deprecated in Kubernetes!
This section follows the instructions from Assigning Pods to Nodes.
Sometimes, we may want to control which node the pod deploys to.
To do that, we can constrain a Pod so that it can only run on particular set of nodes and the recommended approach is using nodeSelector as for the constraint.
The nodeSelector is a field of PodSpec that specifies a map of key-value pairs. For the pod to be eligible to run on a node, the node must have each of the indicated key-value pairs as labels (it can have additional labels as well). The most common usage is one key-value pair.
We'll pick out a node that we want to add a label to,
and then run kubectl label nodes <node-name> <label-key>=<label-value>
to add a label to the node:
For example, our node name is minikube-m03 and my desired label is disktype=ssd, then we can run:
$ kubectl label nodes minikube-m03 disktype=ssd node/minikube-m03 labeled $ kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS minikube Ready master 116m v1.19.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux, kubernetes.io/arch=amd64,kubernetes.io/hostname=minikube,kubernetes.io/os=linux,minikube.k8s.io/commit=0c5e9de4ca6f9c55147ae7f90af97eff5befef5f-dirty,minikube.k8s.io/name=minikube,minikube.k8s.io/updated_at=2021_04_17T13_15_18_0700,minikube.k8s.io/version=v1.13.0,node-role.kubernetes.io/master= minikube-m02 Ready <none> 114m v1.19.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux, kubernetes.io/arch=amd64,kubernetes.io/hostname=minikube-m02,kubernetes.io/os=linux minikube-m03 Ready <none> 113m v1.19.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux, disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=minikube-m03,kubernetes.io/os=linux
Now, we need to add a nodeSelector field to our pod configuration (pod-nginx.yaml):
apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent nodeSelector: disktype: ssd
$ kubectl apply -f pod-nginx.yaml pod/nginx created
We can check if the pod has really been deployed to the "minikube-m03" among the 3 nodes by checking the NODE column from the following output:
$ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx 1/1 Running 0 57s 172.18.0.2 minikube-m03 <none> <none>
To unlable the node:
$ kubectl label nodes minikube-m03 disktype- node/minikube-m03 labeled $ kubectl get nodes --show-labels | grep disktype $
Node affinity is similar to nodeSelector. It allows us to constrain which nodes our pod is eligible to be scheduled on, based on labels on the node.
There are currently two types of node affinity:
- requiredDuringSchedulingIgnoredDuringExecution (must)
- preferredDuringSchedulingIgnoredDuringExecution (not guaranteed)
With the "IgnoredDuringExecution" part of the names makes the pod continues to run on the node even if labels on a node change at runtime such that the affinity rules on a pod are no longer met.
First, we'll use the required... affinity with pod-nginx-required-affinity.yaml manifest:
apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: disktype operator: In values: - ssd
As we can see the operator In being used in the manifest. This node affinity syntax supports the following operators: In, NotIn, Exists, DoesNotExist, Gt, Lt.
We can use NotIn and DoesNotExist to achieve node anti-affinity behavior, or use node taints to repel pods from specific nodes.
$ kubectl label nodes minikube-m02 disktype=ssd node/minikube-m02 labeled $ kubectl get nodes --show-labels | grep ssd minikube-m02 Ready <none> 8h v1.19.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux, disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=minikube-m02,kubernetes.io/os=linux
Let's apply the manifest to create a Pod that is scheduled onto our chosen node (minikube-m02):
$ kubectl apply -f pod-nginx-required-affinity.yaml pod/nginx created $ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx 1/1 Running 0 70s 172.18.0.2 minikube-m02 <none> <none>
Before moving on, let's delete the pod:
$ kubectl delete pod nginx pod "nginx" deleted $ kubectl get pods No resources found in default namespace.
Next, we'll use the preferred... affinity with pod-nginx-required-affinity.yaml manifest:
apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: disktype operator: In values: - ssd
Apply the manifest to create a Pod that is scheduled onto our chosen node (minikube-m02):
$ kubectl apply -f pod-nginx-preferred-affinity.yaml pod/nginx created $ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx 1/1 Running 0 27s 172.18.0.2 minikube-m02 <none> <none>
Though the nodeAffinity was set "preferred...", the pod still attracted to the "minikube-m02" node. But this is not guaranteed!
As a property of pods, the node affinity attracts pods to a set of nodes. However, taints are the opposite and they allow a node to repel a set of pods.
Tolerations are applied to pods, and allow the pods to schedule onto nodes with matching taints.
Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes.
We add a taint to a node using kubectl taint
:
$ kubectl taint nodes minikube-m03 key1=value1:NoSchedule node/minikube-m03 tainted
That places a taint on node "minikube-m03". The taint has key "key1", value bogo_value1, and taint effect "NoSchedule". This means that no pod will be able to schedule onto minikube-m03 unless it has a matching toleration.
To remove the taint added by the command above, we can run the same command with "-" at the end:
$ kubectl taint nodes minikube-m03 key1=value1:NoSchedule-
To check if the node has the taint:
$ kubectl describe node minikube-m03 Name: minikube-m03 Roles: <none> Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=minikube-m03 kubernetes.io/os=linux Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Sat, 17 Apr 2021 13:18:16 -0700 Taints: key1=value1:NoSchedule Unschedulable: false Lease: HolderIdentity: minikube-m03 AcquireTime: <unset> RenewTime: Sun, 18 Apr 2021 00:05:08 -0700 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Sun, 18 Apr 2021 00:03:19 -0700 Sat, 17 Apr 2021 13:18:16 -0700 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Sun, 18 Apr 2021 00:03:19 -0700 Sat, 17 Apr 2021 13:18:16 -0700 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Sun, 18 Apr 2021 00:03:19 -0700 Sat, 17 Apr 2021 13:18:16 -0700 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Sun, 18 Apr 2021 00:03:19 -0700 Sat, 17 Apr 2021 13:18:28 -0700 KubeletReady kubelet is posting ready status Addresses: InternalIP: 172.17.0.5 Hostname: minikube-m03 Capacity: cpu: 2 ephemeral-storage: 24638800Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 2552288Ki pods: 110 Allocatable: cpu: 2 ephemeral-storage: 24638800Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 2552288Ki pods: 110 System Info: Machine ID: 080fdad3064c467faaa850bb7391ed58 System UUID: da11759b-1c63-4abd-80a5-c8ba2ce841e9 Boot ID: 85a4a995-7d97-429e-833a-073fca8ee957 Kernel Version: 4.19.76-linuxkit OS Image: Ubuntu 20.04 LTS Operating System: linux Architecture: amd64 Container Runtime Version: docker://19.3.8 Kubelet Version: v1.19.0 Kube-Proxy Version: v1.19.0 Non-terminated Pods: (2 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- kube-system kindnet-pr4bn 100m (5%) 100m (5%) 50Mi (2%) 50Mi (2%) 10h kube-system kube-proxy-tgpdq 0 (0%) 0 (0%) 0 (0%) 0 (0%) 10h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 100m (5%) 100m (5%) memory 50Mi (2%) 50Mi (2%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: <none>
Create a pod with the following manifest (pod-with-toleration.yaml):
apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent tolerations: - key: "key1" operator: "Equal" value: "value1" effect: "NoSchedule"
The default value for operator is "Equal".
Note that we specify a toleration for a pod in the PodSpec.
The tolerations "match" the taint created by the kubectl taint
comamnd for the node "minikube-m03",
and thus this pod would be able to schedule onto "minikube-m03":
$ kubectl apply -f pod-with-toleration.yaml pod/nginx created $ kubectl get pods nginx -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx 1/1 Running 0 93s 172.18.0.2 minikube-m03 <none> <none>
Though the pod with the tolerations is deployed to a node with taints, it's not guaranted. The purpose of taints/tolerations is to keep pods from deploying to a tainted node. In other words, taints/tolerations is set to repel.
The taints effect:
- NoSchedule: tells Kubernetes scheduler not to schedule any new pods to the node unless the pod tolerates the taint.
- NoExecute: instructs Kubernetes scheduler to evict pods already running on the node that don't tolerate the taint.
- pod affinity: an application that consists of multiple services, some of which may require that they be co-located on the same node for performance reasons.
- anti-affinity: replicas of critical services shouldn't be placed onto the same node to avoid loss in the event of node failure.
Ref: Taints and tolerations, pod and node affinities demystified
Docker & K8s
- Docker install on Amazon Linux AMI
- Docker install on EC2 Ubuntu 14.04
- Docker container vs Virtual Machine
- Docker install on Ubuntu 14.04
- Docker Hello World Application
- Nginx image - share/copy files, Dockerfile
- Working with Docker images : brief introduction
- Docker image and container via docker commands (search, pull, run, ps, restart, attach, and rm)
- More on docker run command (docker run -it, docker run --rm, etc.)
- Docker Networks - Bridge Driver Network
- Docker Persistent Storage
- File sharing between host and container (docker run -d -p -v)
- Linking containers and volume for datastore
- Dockerfile - Build Docker images automatically I - FROM, MAINTAINER, and build context
- Dockerfile - Build Docker images automatically II - revisiting FROM, MAINTAINER, build context, and caching
- Dockerfile - Build Docker images automatically III - RUN
- Dockerfile - Build Docker images automatically IV - CMD
- Dockerfile - Build Docker images automatically V - WORKDIR, ENV, ADD, and ENTRYPOINT
- Docker - Apache Tomcat
- Docker - NodeJS
- Docker - NodeJS with hostname
- Docker Compose - NodeJS with MongoDB
- Docker - Prometheus and Grafana with Docker-compose
- Docker - StatsD/Graphite/Grafana
- Docker - Deploying a Java EE JBoss/WildFly Application on AWS Elastic Beanstalk Using Docker Containers
- Docker : NodeJS with GCP Kubernetes Engine
- Docker : Jenkins Multibranch Pipeline with Jenkinsfile and Github
- Docker : Jenkins Master and Slave
- Docker - ELK : ElasticSearch, Logstash, and Kibana
- Docker - ELK 7.6 : Elasticsearch on Centos 7
- Docker - ELK 7.6 : Filebeat on Centos 7
- Docker - ELK 7.6 : Logstash on Centos 7
- Docker - ELK 7.6 : Kibana on Centos 7
- Docker - ELK 7.6 : Elastic Stack with Docker Compose
- Docker - Deploy Elastic Cloud on Kubernetes (ECK) via Elasticsearch operator on minikube
- Docker - Deploy Elastic Stack via Helm on minikube
- Docker Compose - A gentle introduction with WordPress
- Docker Compose - MySQL
- MEAN Stack app on Docker containers : micro services
- MEAN Stack app on Docker containers : micro services via docker-compose
- Docker Compose - Hashicorp's Vault and Consul Part A (install vault, unsealing, static secrets, and policies)
- Docker Compose - Hashicorp's Vault and Consul Part B (EaaS, dynamic secrets, leases, and revocation)
- Docker Compose - Hashicorp's Vault and Consul Part C (Consul)
- Docker Compose with two containers - Flask REST API service container and an Apache server container
- Docker compose : Nginx reverse proxy with multiple containers
- Docker & Kubernetes : Envoy - Getting started
- Docker & Kubernetes : Envoy - Front Proxy
- Docker & Kubernetes : Ambassador - Envoy API Gateway on Kubernetes
- Docker Packer
- Docker Cheat Sheet
- Docker Q & A #1
- Kubernetes Q & A - Part I
- Kubernetes Q & A - Part II
- Docker - Run a React app in a docker
- Docker - Run a React app in a docker II (snapshot app with nginx)
- Docker - NodeJS and MySQL app with React in a docker
- Docker - Step by Step NodeJS and MySQL app with React - I
- Installing LAMP via puppet on Docker
- Docker install via Puppet
- Nginx Docker install via Ansible
- Apache Hadoop CDH 5.8 Install with QuickStarts Docker
- Docker - Deploying Flask app to ECS
- Docker Compose - Deploying WordPress to AWS
- Docker - WordPress Deploy to ECS with Docker-Compose (ECS-CLI EC2 type)
- Docker - WordPress Deploy to ECS with Docker-Compose (ECS-CLI Fargate type)
- Docker - ECS Fargate
- Docker - AWS ECS service discovery with Flask and Redis
- Docker & Kubernetes : minikube
- Docker & Kubernetes 2 : minikube Django with Postgres - persistent volume
- Docker & Kubernetes 3 : minikube Django with Redis and Celery
- Docker & Kubernetes 4 : Django with RDS via AWS Kops
- Docker & Kubernetes : Kops on AWS
- Docker & Kubernetes : Ingress controller on AWS with Kops
- Docker & Kubernetes : HashiCorp's Vault and Consul on minikube
- Docker & Kubernetes : HashiCorp's Vault and Consul - Auto-unseal using Transit Secrets Engine
- Docker & Kubernetes : Persistent Volumes & Persistent Volumes Claims - hostPath and annotations
- Docker & Kubernetes : Persistent Volumes - Dynamic volume provisioning
- Docker & Kubernetes : DaemonSet
- Docker & Kubernetes : Secrets
- Docker & Kubernetes : kubectl command
- Docker & Kubernetes : Assign a Kubernetes Pod to a particular node in a Kubernetes cluster
- Docker & Kubernetes : Configure a Pod to Use a ConfigMap
- AWS : EKS (Elastic Container Service for Kubernetes)
- Docker & Kubernetes : Run a React app in a minikube
- Docker & Kubernetes : Minikube install on AWS EC2
- Docker & Kubernetes : Cassandra with a StatefulSet
- Docker & Kubernetes : Terraform and AWS EKS
- Docker & Kubernetes : Pods and Service definitions
- Docker & Kubernetes : Service IP and the Service Type
- Docker & Kubernetes : Kubernetes DNS with Pods and Services
- Docker & Kubernetes : Headless service and discovering pods
- Docker & Kubernetes : Scaling and Updating application
- Docker & Kubernetes : Horizontal pod autoscaler on minikubes
- Docker & Kubernetes : From a monolithic app to micro services on GCP Kubernetes
- Docker & Kubernetes : Rolling updates
- Docker & Kubernetes : Deployments to GKE (Rolling update, Canary and Blue-green deployments)
- Docker & Kubernetes : Slack Chat Bot with NodeJS on GCP Kubernetes
- Docker & Kubernetes : Continuous Delivery with Jenkins Multibranch Pipeline for Dev, Canary, and Production Environments on GCP Kubernetes
- Docker & Kubernetes : NodePort vs LoadBalancer vs Ingress
- Docker & Kubernetes : MongoDB / MongoExpress on Minikube
- Docker & Kubernetes : Load Testing with Locust on GCP Kubernetes
- Docker & Kubernetes : MongoDB with StatefulSets on GCP Kubernetes Engine
- Docker & Kubernetes : Nginx Ingress Controller on Minikube
- Docker & Kubernetes : Setting up Ingress with NGINX Controller on Minikube (Mac)
- Docker & Kubernetes : Nginx Ingress Controller for Dashboard service on Minikube
- Docker & Kubernetes : Nginx Ingress Controller on GCP Kubernetes
- Docker & Kubernetes : Kubernetes Ingress with AWS ALB Ingress Controller in EKS
- Docker & Kubernetes : Setting up a private cluster on GCP Kubernetes
- Docker & Kubernetes : Kubernetes Namespaces (default, kube-public, kube-system) and switching namespaces (kubens)
- Docker & Kubernetes : StatefulSets on minikube
- Docker & Kubernetes : RBAC
- Docker & Kubernetes Service Account, RBAC, and IAM
- Docker & Kubernetes - Kubernetes Service Account, RBAC, IAM with EKS ALB, Part 1
- Docker & Kubernetes : Helm Chart
- Docker & Kubernetes : My first Helm deploy
- Docker & Kubernetes : Readiness and Liveness Probes
- Docker & Kubernetes : Helm chart repository with Github pages
- Docker & Kubernetes : Deploying WordPress and MariaDB with Ingress to Minikube using Helm Chart
- Docker & Kubernetes : Deploying WordPress and MariaDB to AWS using Helm 2 Chart
- Docker & Kubernetes : Deploying WordPress and MariaDB to AWS using Helm 3 Chart
- Docker & Kubernetes : Helm Chart for Node/Express and MySQL with Ingress
- Docker & Kubernetes : Deploy Prometheus and Grafana using Helm and Prometheus Operator - Monitoring Kubernetes node resources out of the box
- Docker & Kubernetes : Deploy Prometheus and Grafana using kube-prometheus-stack Helm Chart
- Docker & Kubernetes : Istio (service mesh) sidecar proxy on GCP Kubernetes
- Docker & Kubernetes : Istio on EKS
- Docker & Kubernetes : Istio on Minikube with AWS EC2 for Bookinfo Application
- Docker & Kubernetes : Deploying .NET Core app to Kubernetes Engine and configuring its traffic managed by Istio (Part I)
- Docker & Kubernetes : Deploying .NET Core app to Kubernetes Engine and configuring its traffic managed by Istio (Part II - Prometheus, Grafana, pin a service, split traffic, and inject faults)
- Docker & Kubernetes : Helm Package Manager with MySQL on GCP Kubernetes Engine
- Docker & Kubernetes : Deploying Memcached on Kubernetes Engine
- Docker & Kubernetes : EKS Control Plane (API server) Metrics with Prometheus
- Docker & Kubernetes : Spinnaker on EKS with Halyard
- Docker & Kubernetes : Continuous Delivery Pipelines with Spinnaker and Kubernetes Engine
- Docker & Kubernetes : Multi-node Local Kubernetes cluster : Kubeadm-dind (docker-in-docker)
- Docker & Kubernetes : Multi-node Local Kubernetes cluster : Kubeadm-kind (k8s-in-docker)
- Docker & Kubernetes : nodeSelector, nodeAffinity, taints/tolerations, pod affinity and anti-affinity - Assigning Pods to Nodes
- Docker & Kubernetes : Jenkins-X on EKS
- Docker & Kubernetes : ArgoCD App of Apps with Heml on Kubernetes
- Docker & Kubernetes : ArgoCD on Kubernetes cluster
- Docker & Kubernetes : GitOps with ArgoCD for Continuous Delivery to Kubernetes clusters (minikube) - guestbook
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization