Apache Spark Streaming with Kafka and Cassandra I

bogotobogo.com site search:

Prerequites

We need to make sure Java is installed:

$ java -version
openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)

Eclipse with Maven is needed as well.

Also, note that we're using Spark 2.0 and Scala 2.11 to avoid version mismatch:

Scala 2.11.6
Kafka 0.10.1.0
Spark 2.0.2
Spark Cassandra Connector 2.0.0-M3
Cassandra 3.0.2

Apache Cassandra install

We'll work on Ubuntu 16.04.

$ echo "deb http://www.apache.org/dist/cassandra/debian 36x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.list
$ gpg --keyserver pgp.mit.edu --recv-keys 749D6EEC0353B12C
$ gpg --export --armor 749D6EEC0353B12C | sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get install cassandra
$ sudo service cassandra start

To verify the Cassandra cluster:

$ nodetool status
Datacenter: datacenter
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  102.68 KiB  256          100.0%            726f8c94-dc2a-428f-8070-1b6bcb99ebf5  rack1

Cassandra is Up and running Normally!

Connect to Cassandra cluster using its command line interface cqlsh (Cassandra Query Language shell):

$ cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})

To fix the issue, we need to define environment variable CQLSH_NO_BUNDLED and export it:

$ sudo pip install cassandra-driver
$ export CQLSH_NO_BUNDLED=true

We install the latest Python Cassandra driver and tell cqlsh (which is Python program) to use the external Cassandra Python driver, not the one bundled with the distribution.

$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.6 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.

CQL

CQL is Cassandra's version of SQL. Let's try it:

$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.6 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 };
cqlsh> USE "test";
cqlsh:test> CREATE TABLE my_table(key text PRIMARY KEY, value int);
cqlsh:test> INSERT INTO my_table(key, value) VALUES ('key1', 1);
cqlsh:test> INSERT INTO my_table(key, value) VALUES ('key2', 2);
cqlsh:test> SELECT * from my_table;

 key  | value
------+-------
 key1 |     1
 key2 |     2

In the code, we created a keyspace "test" and a table ("my_table") in that keyspace. Then we stored (kev, value) pairs and displayed them.

Cassandra datasax distribution

We may want to add the DataStax community repository:

$ echo "deb http://debian.datastax.com/community stable main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
$ curl -L http://debian.datastax.com/debian/repo_key | sudo apt-key add -

Then, install it:

$ sudo apt-get update
$ sudo apt-get install dsc30=3.0.2-1 cassandra=3.0.2

Because the Debian packages start the Cassandra service automatically, we must stop the server and clear the data. Doing the following removes the default cluster_name (Test Cluster) from the system table. All nodes must use the same cluster name.

$ sudo service cassandra stop
$ sudo rm -rf /var/lib/cassandra/data/system/*

We can use cql now:

$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.0.2 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh>

$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns    Host ID                               Rack
UN  127.0.0.1  230.76 KB  256          ?       926eafc7-9aca-4dea-ba46-6cdee3b6ac2d  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

We got an error (or warning). It is an informative message when using 'nodetool status' without specifying a keyspace. That's because we've created "test" keyspace in earlier section.

$ nodetool status test
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  230.76 KB  256          100.0%            926eafc7-9aca-4dea-ba46-6cdee3b6ac2d  rack1

So, we may want to drop it:

cqlsh> drop keyspace test;

Apache spark install

Download the latest pre-built Apache spark version for Hadoop2.6:

$ wget d3kbcqa49mib13.cloudfront.net/spark-2.0.2-bin-hadoop2.6.tgz
$ sudo tar xvzf spark-2.0.2-bin-hadoop2.6.tgz -C /usr/local

Let's modify ~/.bashrc:

export SPARK_HOME=/usr/local/spark-2.0.2-bin-hadoop2.6
export PATH=$SPARK_HOME/bin:$PATH

Now, we're ready to use spark:

Let's test it out. Open up a spark shell:

$ $SPARK_HOME/bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
...
Spark context Web UI available at http://192.168.200.180:4040
Spark context available as 'sc' (master = local[*], app id = local-1482501175177).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.2
      /_/
         
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_111)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

Now let's get spark to do a calculation on the Scala prompt:

scala> sc.parallelize( 1 to 100 ).sum()
res0: Double = 5050.0

sbt install

We may want to skip if we want to install scala provided by debian package. If so, go to next section.

sbt is an open source build tool for Scala and Java projects, similar to Java's Maven or Ant. Let's install the sbt:

$ echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2EE0EA64E40A89B84B2DF73499E82A75642AC823
$ sudo apt-get update
$ sudo apt-get install sbt

scala 2.11 install

Let's install scala:

$ sudo apt-get install scala

$ scala -version
Scala code runner version 2.11.6 -- Copyright 2002-2013, LAMP/EPFL

$ which sbt
/usr/bin/sbt

Spark Cassandra Connector

Spark doesn't natively know how to talk Cassandra, but it's functionality can be extended by using connectors.

To connect Spark to a Cassandra cluster, the Cassandra Connector will need to be added to the Spark project. DataStax provides their own Cassandra Connector on GitHub and we can download from GitHub:

$ git clone https://github.com/datastax/spark-cassandra-connector.git

Once it's cloned it then we'll need to build it using the sbt that comes with the connector:

$ cd spark-cassandra-connector

$ sbt assembly -Dscala-2.11=true

When the build is finished, there will be a jar files in a target directory:

$ ls ~/spark-cassandra-connector/spark-cassandra-connector/target/full/scala-2.10
classes  spark-cassandra-connector-assembly-2.0.0-M3-104-g7c8c546.jar  test-classes

Using Spark Cassandra Connector

Just for now, let's move the file into home(~):

$ cp ~/spark-cassandra-connector/spark-cassandra-connector/target/full/scala-2.10/spark-cassandra-connector-assembly-2.0.0-M3-104-g7c8c546.jar ~

Then, start the spark shell again from within spark directory with the jar:

$ $SPARK_HOME/bin/spark-shell --jars ~/spark-cassandra-connector-assembly-2.0.0-M3-104-g7c8c546.jar
...
Spark context Web UI available at http://192.168.200.180:4040
Spark context available as 'sc' (master = local[*], app id = local-1482560890957).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.2
      /_/
         
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_111)

scala>

Before connecting the Spark Context to the Cassandra cluster, let's stop the default context:

scala> sc.stop

Import the necessary jar files:

scala> import com.datastax.spark.connector._, org.apache.spark.SparkContext, org.apache.spark.SparkContext._, org.apache.spark.SparkConf
import com.datastax.spark.connector._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

Make a new SparkConf with the Cassandra connection details:

scala> val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@3ef2b8e5

Create a new Spark Context:

scala> val sc = new SparkContext(conf)
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@35010a6b

Now we have a new SparkContext which is connected to our Cassandra cluster!

Spark with Cassandra's keyspace and table

Since we deleted keyspace and table, we need to create them again for Cassandra cluster testing.

$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.6 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 };
cqlsh> USE "test";
cqlsh:test> CREATE TABLE my_table(key text PRIMARY KEY, value int);
cqlsh:test> INSERT INTO my_table(key, value) VALUES ('key1', 1);
cqlsh:test> INSERT INTO my_table(key, value) VALUES ('key2', 2);
cqlsh:test> SELECT * from my_table;

 key  | value
------+-------
 key1 |     1
 key2 |     2

(2 rows)
cqlsh:test>

Now we can use the keyspace called "test" and a table called "my_table". To read data from Cassandra, we create an RDD (Resilient Distributed DataSet) from a specific table. The RDD is a fundamental data structure of Spark. Each dataset in RDD is divided into logical partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes.

On a scala prompt of spark-shell:

scala> val test_spark_rdd = sc.cassandraTable("test", "my_table")

Lets check what's the first element in this RDD:

scala> test_spark_rdd.first
res1: com.datastax.spark.connector.CassandraRow = CassandraRow{key: key1, value: 1}

Kafka & Zookeeper

Let's install Kafka. Download:

$ wget http://apache.mirror.cdnetworks.com/kafka/0.10.1.0/kafka_2.11-0.10.1.0.tgz
$ tar -tzf kafka_2.11-0.10.1.0.tgz

Kafka uses ZooKeeper so we need to first start a ZooKeeper server if we don't already have one. We can use the convenience script packaged with kafka to get a quick-and-dirty single-node ZooKeeper instance:

$ bin/zookeeper-server-start.sh config/zookeeper.properties

Now start the Kafka server:

$ bin/kafka-server-start.sh config/server.properties

Create a Kafka topic

Let's create a topic named "test" with a single partition and only one replica:

To test Kafka, create a sample topic with name "testing" in Apache Kafka using the following command:

$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

We should see the following output:

Created topic "testing"

Here the behavior of first() is identical to take(1).

We can ask Zookeeper to list available topics on Apache Kafka by running the following command:

$ bin/kafka-topics.sh --list --zookeeper localhost:2181
testing

Kafka's producer/consumer command

Now, publish a sample messages to Apache Kafka topic called testing by using the following producer command:

$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic testing

After running above command, enter some messages like "Spooky action at a distance?" press enter, then enter another message like "Quantum entanglement":

$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic testing
Spooky action at a distance?
Quantum entanglement

Type Ctrl-D to finish the message.

Now, use consumer command to retrieve messages on Apache Kafka Topic called "testing" by running the following command, and we should see the messages we typed in earlier played back to us:

$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic testing --from-beginning
Spooky action at a distance?
Quantum entanglement

Big Data & Hadoop Tutorials

Hadoop 2.6 - Installing on Ubuntu 14.04 (Single-Node Cluster)

Hadoop 2.6.5 - Installing on Ubuntu 16.04 (Single-Node Cluster)

Hadoop - Running MapReduce Job

Hadoop - Ecosystem

CDH5.3 Install on four EC2 instances (1 Name node and 3 Datanodes) using Cloudera Manager 5

CDH5 APIs

QuickStart VMs for CDH 5.3

QuickStart VMs for CDH 5.3 II - Testing with wordcount

QuickStart VMs for CDH 5.3 II - Hive DB query

Scheduled start and stop CDH services

CDH 5.8 Install with QuickStarts Docker

Zookeeper & Kafka Install

Zookeeper & Kafka - single node single broker

Zookeeper & Kafka - Single node and multiple brokers

OLTP vs OLAP

Apache Hadoop Tutorial I with CDH - Overview

Apache Hadoop Tutorial II with CDH - MapReduce Word Count

Apache Hadoop Tutorial III with CDH - MapReduce Word Count 2

Apache Hadoop (CDH 5) Hive Introduction

CDH5 - Hive Upgrade to 1.3 to from 1.2

Apache Hive 2.1.0 install on Ubuntu 16.04

Apache Hadoop : HBase in Pseudo-Distributed mode

Apache Hadoop : Creating HBase table with HBase shell and HUE

Apache Hadoop : Hue 3.11 install on Ubuntu 16.04

Apache Hadoop : Creating HBase table with Java API

Apache HBase : Map, Persistent, Sparse, Sorted, Distributed and Multidimensional

Apache Hadoop - Flume with CDH5: a single-node Flume deployment (telnet example)

Apache Hadoop (CDH 5) Flume with VirtualBox : syslog example via NettyAvroRpcClient

List of Apache Hadoop hdfs commands

Apache Hadoop : Creating Wordcount Java Project with Eclipse Part 1

Apache Hadoop : Creating Wordcount Java Project with Eclipse Part 2

Apache Hadoop : Creating Card Java Project with Eclipse using Cloudera VM UnoExample for CDH5 - local run

Apache Hadoop : Creating Wordcount Maven Project with Eclipse

Wordcount MapReduce with Oozie workflow with Hue browser - CDH 5.3 Hadoop cluster using VirtualBox and QuickStart VM

Spark 1.2 using VirtualBox and QuickStart VM - wordcount

Spark Programming Model : Resilient Distributed Dataset (RDD) with CDH

Apache Spark 1.2 with PySpark (Spark Python API) Wordcount using CDH5

Apache Spark 1.2 Streaming

Apache Spark 2.0.2 with PySpark (Spark Python API) Shell

Apache Spark 2.0.2 tutorial with PySpark : RDD

Apache Spark 2.0.0 tutorial with PySpark : Analyzing Neuroimaging Data with Thunder

Apache Spark Streaming with Kafka and Cassandra

Apache Drill with ZooKeeper - Install on Ubuntu 16.04

Apache Drill - Query File System, JSON, and Parquet

Apache Drill - HBase query

Apache Drill - Hive query

Apache Drill - MongoDB query

Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization

My YouTube channel

Sponsor Open Source development activities and free contents for everyone.

Thank you.

- K Hong

Big Data & Hadoop Tutorials

Hadoop 2.6 - Installing on Ubuntu 14.04 (Single-Node Cluster)

Hadoop 2.6.5 - Installing on Ubuntu 16.04 (Single-Node Cluster)

Hadoop - Running MapReduce Job

Hadoop - Ecosystem

CDH5.3 Install on four EC2 instances (1 Name node and 3 Datanodes) using Cloudera Manager 5

CDH5 APIs

QuickStart VMs for CDH 5.3

QuickStart VMs for CDH 5.3 II - Testing with wordcount

QuickStart VMs for CDH 5.3 II - Hive DB query

Scheduled start and stop CDH services

CDH 5.8 Install with QuickStarts Docker

Zookeeper & Kafka Install

Zookeeper & Kafka - single node single broker

Zookeeper & Kafka - Single node and multiple brokers

OLTP vs OLAP

Apache Hadoop Tutorial I with CDH - Overview

Apache Hadoop Tutorial II with CDH - MapReduce Word Count

Apache Hadoop Tutorial III with CDH - MapReduce Word Count 2

Apache Hadoop (CDH 5) Hive Introduction

CDH5 - Hive Upgrade to 1.3 to from 1.2

Apache Hive 2.1.0 install on Ubuntu 16.04

Apache HBase in Pseudo-Distributed mode

Creating HBase table with HBase shell and HUE

Apache Hadoop : Hue 3.11 install on Ubuntu 16.04

Creating HBase table with Java API

HBase - Map, Persistent, Sparse, Sorted, Distributed and Multidimensional

Flume with CDH5: a single-node Flume deployment (telnet example)

Apache Hadoop (CDH 5) Flume with VirtualBox : syslog example via NettyAvroRpcClient

List of Apache Hadoop hdfs commands

Apache Hadoop : Creating Wordcount Java Project with Eclipse Part 1

Apache Hadoop : Creating Wordcount Java Project with Eclipse Part 2

Apache Hadoop : Creating Card Java Project with Eclipse using Cloudera VM UnoExample for CDH5 - local run

Apache Hadoop : Creating Wordcount Maven Project with Eclipse

Wordcount MapReduce with Oozie workflow with Hue browser - CDH 5.3 Hadoop cluster using VirtualBox and QuickStart VM

Spark 1.2 using VirtualBox and QuickStart VM - wordcount

Spark Programming Model : Resilient Distributed Dataset (RDD) with CDH

Apache Spark 2.0.2 with PySpark (Spark Python API) Shell

Apache Spark 2.0.2 tutorial with PySpark : RDD

Apache Spark 2.0.0 tutorial with PySpark : Analyzing Neuroimaging Data with Thunder

Apache Spark Streaming with Kafka and Cassandra

Apache Spark 1.2 with PySpark (Spark Python API) Wordcount using CDH5

Apache Spark 1.2 Streaming

Apache Drill with ZooKeeper install on Ubuntu 16.04 - Embedded & Distributed

Apache Drill - Query File System, JSON, and Parquet

Apache Drill - HBase query

Apache Drill - Hive query

Apache Drill - MongoDB query

Sponsor Open Source development activities and free contents for everyone.

Thank you.

- K Hong

Elasticsearch search engine, Logstash, and Kibana

Elasticsearch, search engine

Logstash with Elasticsearch

Logstash, Elasticsearch, and Kibana 4

Elasticsearch with Redis broker and Logstash Shipper and Indexer

Samples of ELK architecture

Elasticsearch indexing performance

DevOps

Phases of Continuous Integration

Software development methodology

Introduction to DevOps

Samples of Continuous Integration (CI) / Continuous Delivery (CD) - Use cases

Artifact repository and repository management

Linux - General, shell programming, processes & signals ...

RabbitMQ...

MariaDB

New Relic APM with NodeJS : simple agent setup on AWS instance

Nagios on CentOS 7 with Nagios Remote Plugin Executor (NRPE)

Nagios - The industry standard in IT infrastructure monitoring on Ubuntu

Zabbix 3 install on Ubuntu 14.04 & adding hosts / items / graphs

Datadog - Monitoring with PagerDuty/HipChat and APM

Install and Configure Mesos Cluster

Cassandra on a Single-Node Cluster

Container Orchestration : Docker Swarm vs Kubernetes vs Apache Mesos

OpenStack install on Ubuntu 16.04 server - DevStack

AWS EC2 Container Service (ECS) & EC2 Container Registry (ECR) | Docker Registry

CI/CD with CircleCI - Heroku deploy

Introduction to Terraform with AWS elb & nginx

Docker & Kubernetes

Kubernetes I - Running Kubernetes Locally via Minikube

Kubernetes II - kops on AWS

Kubernetes III - kubeadm on AWS

AWS : EKS (Elastic Container Service for Kubernetes)

CI/CD Github actions

CI/CD Gitlab

DevOps / Sys Admin Q & A

(1A) - Linux Commands

(1B) - Linux Commands

(2) - Networks

(2B) - Networks

(3) - Linux Systems

(4) - Scripting (Ruby/Shell)

(5) - Configuration Management

(6) - AWS VPC setup (public/private subnets with NAT)

(6B) - AWS VPC Peering

(7) - Web server

(8) - Database

(9) - Linux System / Application Monitoring, Performance Tuning, Profiling Methods & Tools

(10) - Trouble Shooting: Load, Throughput, Response time and Leaks

(11) - SSH key pairs, SSL Certificate, and SSL Handshake

(12) - Why is the database slow?

(13) - Is my web site down?

(14) - Is my server down?

(15) - Why is the server sluggish?

(16A) - Serving multiple domains using Virtual Hosts - Apache

(16B) - Serving multiple domains using server block - Nginx

(16C) - Reverse proxy servers and load balancers - Nginx

(17) - Linux startup process

(18) - phpMyAdmin with Nginx virtual host as a subdomain

(19) - How to SSH login without password?

(20) - Log Rotation

(21) - Monitoring Metrics

(22) - lsof

(23) - Wireshark introduction

(24) - User account management

(25) - Domain Name System (DNS)

(26) - NGINX SSL/TLS, Caching, and Session

(27) - Troubleshooting 5xx server errors

(28) - Linux Systemd: journalctl

(29) - Linux Systemd: FirewallD

(30) - Linux: SELinux

(31) - Linux: Samba

(0) - Linux Sys Admin's Day to Day tasks

AWS (Amazon Web Services)

AWS : EKS (Elastic Container Service for Kubernetes)

AWS : Creating a snapshot (cloning an image)

AWS : Attaching Amazon EBS volume to an instance

AWS : Adding swap space to an attached volume via mkswap and swapon

AWS : Creating an EC2 instance and attaching Amazon EBS volume to the instance using Python boto module with User data

AWS : Creating an instance to a new region by copying an AMI

AWS : S3 (Simple Storage Service) 1

AWS : S3 (Simple Storage Service) 2 - Creating and Deleting a Bucket

AWS : S3 (Simple Storage Service) 3 - Bucket Versioning

AWS : S3 (Simple Storage Service) 4 - Uploading a large file

AWS : S3 (Simple Storage Service) 5 - Uploading folders/files recursively

AWS : S3 (Simple Storage Service) 6 - Bucket Policy for File/Folder View/Download

AWS : S3 (Simple Storage Service) 7 - How to Copy or Move Objects from one region to another

AWS : S3 (Simple Storage Service) 8 - Archiving S3 Data to Glacier

AWS : Creating a CloudFront distribution with an Amazon S3 origin

AWS : Creating VPC with CloudFormation

WAF (Web Application Firewall) with preconfigured CloudFormation template and Web ACL for CloudFront distribution

AWS : CloudWatch & Logs with Lambda Function / S3

AWS : Lambda Serverless Computing with EC2, CloudWatch Alarm, SNS

AWS : Lambda and SNS - cross account

AWS : CLI (Command Line Interface)

AWS : CLI (ECS with ALB & autoscaling)

AWS : ECS with cloudformation and json task definition

AWS : AWS Application Load Balancer (ALB) and ECS with Flask app

AWS : Load Balancing with HAProxy (High Availability Proxy)

AWS : VirtualBox on EC2

AWS : NTP setup on EC2

AWS: jq with AWS

AWS : AWS & OpenSSL : Creating / Installing a Server SSL Certificate

AWS : OpenVPN Access Server 2 Install

AWS : VPC (Virtual Private Cloud) 1 - netmask, subnets, default gateway, and CIDR

AWS : VPC (Virtual Private Cloud) 2 - VPC Wizard

AWS : VPC (Virtual Private Cloud) 3 - VPC Wizard with NAT

AWS : DevOps / Sys Admin Q & A (VI) - AWS VPC setup (public/private subnets with NAT)

AWS : OpenVPN Protocols : PPTP, L2TP/IPsec, and OpenVPN

AWS : Autoscaling group (ASG)

AWS : Setting up Autoscaling Alarms and Notifications via CLI and Cloudformation

AWS : Adding a SSH User Account on Linux Instance

AWS : Windows Servers - Remote Desktop Connections using RDP

AWS : Scheduled stopping and starting an instance - python & cron

AWS : Detecting stopped instance and sending an alert email using Mandrill smtp

AWS : Elastic Beanstalk with NodeJS

AWS : Elastic Beanstalk Inplace/Rolling Blue/Green Deploy

AWS : Identity and Access Management (IAM) Roles for Amazon EC2

AWS : Identity and Access Management (IAM) Policies, sts AssumeRole, and delegate access across AWS accounts

AWS : Identity and Access Management (IAM) sts assume role via aws cli2

AWS : Creating IAM Roles and associating them with EC2 Instances in CloudFormation

AWS Identity and Access Management (IAM) Roles, SSO(Single Sign On), SAML(Security Assertion Markup Language), IdP(identity provider), STS(Security Token Service), and ADFS(Active Directory Federation Services)

AWS : Amazon Route 53

AWS : Amazon Route 53 - DNS (Domain Name Server) setup

AWS : Amazon Route 53 - subdomain setup and virtual host on Nginx

AWS Amazon Route 53 : Private Hosted Zone

AWS : SNS (Simple Notification Service) example with ELB and CloudWatch

AWS : Lambda with AWS CloudTrail

AWS : SQS (Simple Queue Service) with NodeJS and AWS SDK

AWS : Redshift data warehouse

AWS : CloudFormation - templates, change sets, and CLI

AWS : CloudFormation Bootstrap UserData/Metadata

AWS : CloudFormation - Creating an ASG with rolling update

AWS : Cloudformation Cross-stack reference

AWS : OpsWorks

AWS : Network Load Balancer (NLB) with Autoscaling group (ASG)

AWS CodeDeploy : Deploy an Application from GitHub

AWS EC2 Container Service (ECS)

AWS EC2 Container Service (ECS) II

AWS Hello World Lambda Function

AWS Lambda Function Q & A

AWS Node.js Lambda Function & API Gateway

AWS API Gateway endpoint invoking Lambda function

AWS API Gateway invoking Lambda function with Terraform

AWS API Gateway invoking Lambda function with Terraform - Lambda Container

Amazon Kinesis Streams

Kinesis Data Firehose with Lambda and ElasticSearch

Amazon DynamoDB

Amazon DynamoDB with Lambda and CloudWatch

Loading DynamoDB stream to AWS Elasticsearch service with Lambda

Amazon ML (Machine Learning)

Simple Systems Manager (SSM)

AWS : RDS Connecting to a DB Instance Running the SQL Server Database Engine

AWS : RDS Importing and Exporting SQL Server Data

AWS : RDS PostgreSQL & pgAdmin III

AWS : RDS PostgreSQL 2 - Creating/Deleting a Table

AWS : MySQL Replication : Master-slave

AWS : MySQL backup & restore

AWS RDS : Cross-Region Read Replicas for MySQL and Snapshots for PostgreSQL

AWS : Restoring Postgres on EC2 instance from S3 backup

AWS : Q & A

AWS : Security

AWS : Security groups vs. network ACLs

AWS : Scaling-Up

AWS : Networking

AWS : Single Sign-on (SSO) with Okta

AWS : JIT (Just-in-Time) with Okta

Docker & K8s

Terraform

Ansible 2.0

What is Ansible?

Quick Preview - Setting up web servers with Nginx, configure environments, and deploy an App

SSH connection & running commands

Ansible: Playbook for Tomcat 9 on Ubuntu 18.04 systemd with AWS

Modules

Playbooks

Handlers

Roles

Playbook for LAMP HAProxy

Installing Nginx on a Docker container

AWS : Creating an ec2 instance & adding keys to authorized_keys

AWS : Auto Scaling via AMI

AWS : creating an ELB & registers an EC2 instance from the ELB

Deploying Wordpress micro-services with Docker containers on Vagrant box via Ansible

Setting up Apache web server

Deploying a Go app to Minikube

Ansible with Terraform

Jenkins

Install

Configuration - Manage Jenkins - security setup

Adding job and build

Scheduling jobs

Managing_plugins

Git/GitHub plugins, SSH keys configuration, and Fork/Clone

JDK & Maven setup

Build configuration for GitHub Java application with Maven

Build Action for GitHub Java application with Maven - Console Output, Updating Maven

Commit to changes to GitHub & new test results - Build Failure

Commit to changes to GitHub & new test results - Successful Build

Adding code coverage and metrics

Jenkins on EC2 - creating an EC2 account, ssh to EC2, and install Apache server

Jenkins on EC2 - setting up Jenkins account, plugins, and Configure System (JAVA_HOME, MAVEN_HOME, notification email)

Jenkins on EC2 - Creating a Maven project

Jenkins on EC2 - Configuring GitHub Hook and Notification service to Jenkins server for any changes to the repository

Jenkins on EC2 - Line Coverage with JaCoCo plugin

Setting up Master and Slave nodes

Jenkins Build Pipeline & Dependency Graph Plugins

Jenkins Build Flow Plugin

Pipeline Jenkinsfile with Classic / Blue Ocean

Jenkins Setting up Slave nodes on AWS

Jenkins Q & A

Puppet

Puppet with Amazon AWS I - Puppet accounts

Puppet with Amazon AWS II (ssh & puppetmaster/puppet install)

Puppet with Amazon AWS III - Puppet running Hello World

Puppet Code Basics - Terminology

Puppet with Amazon AWS on CentOS 7 (I) - Master setup on EC2

Puppet with Amazon AWS on CentOS 7 (II) - Configuring a Puppet Master Server with Passenger and Apache

Puppet master /agent ubuntu 14.04 install on EC2 nodes

Puppet master post install tasks - master's names and certificates setup,

Puppet agent post install tasks - configure agent, hostnames, and sign request

EC2 Puppet master/agent basic tasks - main manifest with a file resource/module and immediate execution on an agent node

Setting up puppet master and agent with simple scripts on EC2 / remote install from desktop

EC2 Puppet - Install lamp with a manifest ('puppet apply')

EC2 Puppet - Install lamp with a module

Puppet variable scope

Puppet packages, services, and files

Puppet packages, services, and files II with nginx Puppet templates

Puppet creating and managing user accounts with SSH access

Puppet Locking user accounts & deploying sudoers file

Puppet exec resource

Puppet classes and modules

Puppet Forge modules

Puppet Express

Puppet Express 2

Puppet 4 : Changes

Puppet --configprint

Puppet with Docker

Puppet 6.0.2 install on Ubuntu 18.04

Chef

What is Chef?

Chef install on Ubuntu 14.04 - Local Workstation via omnibus installer

Setting up Hosted Chef server

VirtualBox via Vagrant with Chef client provision

Creating and using cookbooks on a VirtualBox node

Chef server install on Ubuntu 14.04

Chef workstation setup on EC2 Ubuntu 14.04

Chef Client Node - Knife Bootstrapping a node on EC2 ubuntu 14.04

Vagrant

VirtualBox & Vagrant install on Ubuntu 14.04

Creating a VirtualBox using Vagrant

Provisioning

Networking - Port Forwarding

Vagrant Share

Vagrant Rebuild & Teardown

Vagrant & Ansible

Redis In-Memory Database

Redis vs Memcached

Redis 3.0.1 Install

Setting up multiple server instances on a Linux host

Redis with Python

ELK : Elasticsearch with Redis broker and Logstash Shipper and Indexer

Git/GitHub Tutorial

One page express tutorial for GIT and GitHub

Installation

add/status/log

commit and diff

git commit --amend

Deleting and Renaming files

Undoing Things : File Checkout & Unstaging

Reverting commit

Soft Reset - (git reset --soft <SHA key>)

Mixed Reset - Default

Hard Reset - (git reset --hard <SHA key>)

Creating & switching Branches

Fast-forward merge

Rebase & Three-way merge

Merge conflicts with a simple example

GitHub Account and SSH

Uploading to GitHub

GUI

Branching & Merging

Merging conflicts

GIT on Ubuntu and OS X - Focused on Branching

Setting up a remote repository / pushing local project and cloning the remote repo

Fork vs Clone, Origin vs Upstream

Git/GitHub Terminologies

Git/GitHub via SourceTree I : Commit & Push

Git/GitHub via SourceTree II : Branching & Merging

Git/GitHub via SourceTree III : Git Work Flow

Git/GitHub via SourceTree IV : Git Reset

Git Cheat sheet - quick command reference

Subversion

Subversion Install On Ubuntu 14.04

Subversion creating and accessing I

Subversion creating and accessing II

Powershell 4 Tutorial

Powersehll : Introduction

Powersehll : Help System

Powersehll : Running commands

Powersehll : Providers

Powersehll : Pipeline

Powersehll : Objects

Powershell : Remote Control

Windows Management Instrumentation (WMI)

How to Enable Multiple RDP Sessions in Windows 2012 Server

How to install and configure FTP server on IIS 8 in Windows 2012 Server

How to Run Exe as a Service on Windows 2012 Server

SQL Inner, Left, Right, and Outer Joins