AWS: Kinesis Data Firehose with Lambda and ElasticSearch

bogotobogo.com site search:

Introduction

In this post, we'll learn how Kinesis Firehose captures streaming data and transforms the data, and then sends it to ElasticSearch service.

We'll do the following:

Generate streaming data containing stock quote information
Send the data to an Kinesis Firehose delivery stream
Kinesis Firehose will then call an Lambda function to transform the data
Kinesis Firehose will then collect the data into batches and send the batches to an Elasticsearch service cluster
We will use Kibana to visualize the streaming data stored in the Elasticsearch cluster

Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data stores and analytics tools. It can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk, enabling near real-time analytics with existing business intelligence tools and dashboards you’re already using today. It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, transform, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security.- from Amazon Kinesis Data Firehose

With Firehose, we do not need to write any applications or manage any resources. We just configure our data producers to send data to Firehose and it automatically delivers the data to the specified destination.

As mentioned earlier, in this post, we will be sending streaming data to Kinesis Firehose, which will capture, transform and batch the data before sending it to Elasticsearch:

Creating a lambda function for Data Transformation

Data producers will send records to our stream which we will transform using Lambda functions that will be created in this section. After that, the transformed records will be send to ElasticSearch service via Kinesis Firehose.

Here, we'll create a lambda function that will transform the incoming stock data into a format suitable for visualization.

It is quite common for data to require modification prior to processing, such as adding/removing fields, combining fields, converting formats and dropping irrelevant records. AWS Lambda enables data transformation on-the-fly when the streaming data arrives for processing in Amazon Kinesis Firehose.

In this section, incoming stock data will be in JSON format, such as:

{"ticker_symbol":"QXZ", "sector":"HEALTHCARE", "change":-0.05, "price":84.51}

To visualize this data, it requires the addition of a Timestamp to identify when the record was received. We will configure an AWS Lambda function that will transform the data into:

{'timestamp': '2019-02-26T04:33:19.581522', 'ticker_symbol': QXZ', 'price': 84.51}

Let's create the function:

where the role has the following policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "cloudwatch:*"
            ],
            "Resource": [
                "*"
            ],
            "Effect": "Allow"
        }
    ]
}

The role gives execution permissions to our Lambda function so it can send log output to Amazon CloudWatch Logs.

lambda_function.lambda_handler:

import base64
import json
from datetime import datetime

# Incoming Event
def lambda_handler(event, context):
    output = []
    now = datetime.utcnow().isoformat()

    # Loop through records in incoming Event
    for record in event['records']:

        # Extract message
        message = json.loads(base64.b64decode(record['data']))

        # Construct output
        data_field = {
                'timestamp': now,
                'ticker_symbol': message['ticker_symbol'],
                'price': message['price']
        }
        output_record = {
                'recordId': record['recordId'],
                'result': 'Ok',
                'data': base64.b64encode(json.dumps(data_field))
        }
        output.append(output_record)

    return {'records': output}

Creating a Kinesis Firehose

In this task, we will create a Kinesis Firehose delivery stream. It will transform incoming data by using the Lambda function we just created and will then send the output to Elasticsearch. Here, we assume, we've already created an Elasticsearch Service cluster.

On the Kinesis service, click "Get started" and "Create delivery stream".

Delivery stream name

The information on the screen explains options for accepting incoming streaming data.

Scroll to the bottom of the screen, then click "Next".

The Transform source records page allows a Lambda function to be specified for transforming incoming data. We will be transforming the content of the incoming data to add a timestamp.

For Record transformation, click Enabled. For Lambda function, select Add-Stock-Timestamp.

Then, scroll to the bottom of the screen, click "Next"

For Destination, select Amazon Elasticsearch Service, then configure: Domain: stocks, Index: "stock", Type: "stock".

An Amazon S3 bucket should have already been created. It will be used to store any records that fail to be processed. In the S3 backup section, for Backup S3 bucket, select the bucket, and click "Next".

The Configure settings page will be displayed. The Elasticsearch buffer conditions can be used to specify when Kinesis Firehose should send data to the Elasticsearch cluster. We will configure data to be sent whenever there is 1 MB of data or when 60 seconds has passed.

We will now specify the permissions assigned to your Firehose. It will be given permission to use Amazon S3, AWS Lambda, Amazon Elasticsearch Service and Amazon CloudWatch Logs.

Scroll down to IAM role and then click "Create new", then configure: IAM Role: Demo-Firehose-Role, Policy Name: Demo-Firehose-Policy

where Demo-Firehose-Role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:Get*",
                "s3:List*",
                "s3:PutObject",
                "lambda:*",
                "es:*",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "*"
            ],
            "Effect": "Allow",
            "Sid": "S3"
        }
    ]
}

Click "Create delivery stream".

Accessing Elasticsearch cluster

On the Elasticsearch Service, from the list of My Elasticsearch domains, click stocks.

Examine the Domain status. We will need to wait for the status to be Active, then Click the Kibana link.

A new tab opens with Kibana. Kibana is an open-source data visualization and exploration that has tight integration with Elasticsearch, which makes Kibana the default choice for visualizing data stored in Elasticsearch.

Sending to the Firehose delivery stream

We are now ready to send data to the Firehose delivery stream.

We will use a built-in testing function. In the Kinesis service, click the name of our "stocks-stream". Expand Test with demo data section at the top of the page, and click "Start sending demo data"

This will now start sending random stock data to the delivery stream, such as:

{"ticker_symbol":"QXZ", "sector":"HEALTHCARE", "change":-0.05, "price":84.51}

The Lambda function will then transform the data by adding a timestamp and only including fields necessary for visualization, such as:

{'timestamp': '2017-07-30T04:33:19.581522', 'ticker_symbol': QXZ', 'price': 84.51}

The data will then be sent to the Elasticsearch cluster, where it will be available for visualization in Kibana.

Kibana - visualization

Keep the Kinesis Firehose tab open so that it continues to send data.

Switch back to the Kibana tab in our web browser. For Index name or pattern, replace logstash-* with "stock". In the Time-field name pull-down, select timestamp.

Click "Create", then a page showing the stock configuration should appear, in the left navigation pane, click Visualize, and click "Create a visualization". For Select visualization type, click Line chart, click stock.

We will now configure a chart to show stock prices over time. Under the metrics heading, click the arrow beside Y-Axis, then configure: Aggregation: Average, Field: price. Under buckets, click X-Axis, then configure: Aggregation: Date Histogram, Field: timestamp.

At the bottom of the screen, click "Add sub-buckets", click Split Lines, then configure: Sub Aggregation: Terms, Field: ticker_symbol.keyword.

Click the play button at the top of the screen to view the chart. The chart will now display stock prices over time. This data is being received in real-time from Amazon Kinesis Firehose. Click Refresh above the cart every 30 seconds to update the display.

Monitoring

You can monitor information about the Firehose delivery stream by viewing metrics captured by Amazon CloudWatch. Scroll to the bottom of the Kinesis console, then click the Monitoring tab: