Datadog v.5 - Monitoring with PagerDuty/HipChat and APM

bogotobogo.com site search:

Note

In this article, we'll install datadog agent on 3 hosts and will integrate the services such as PagerDuty, HipChat, AWS, nginx, apache, mysql, redis, and elasticsearch.

Here is the preview of screen shots after all done:

Getting the package url

We'll use trial version of Datadog v.5.12.3.

Depending on the agent, it will give us url that we can use for our installation:

Install Datadog Agent

The Datadog Agent collects metrics and events from our systems and apps. Install at least one Agent anywhere, even on our workstation.

Let's install it using the url provided and following the instructions for a given OS:

$ DD_API_KEY=d9c0b1fa9ef2b1402b08323d8b235529 bash -c "$(curl -L https://raw.githubusercontent.com/DataDog/dd-agent/master/packaging/datadog-agent/source/install_agent.sh)"
...
* Installing apt-transport-https
...
* Installing APT package sources for Datadog
...
* Installing the Datadog Agent package
...
Enabling service datadog-agent
Creating dd-agent group
Creating dd-agent user
...
* Adding your API key to the Agent configuration: /etc/dd-agent/datadog.conf

* Starting the Agent...


Your Agent has started up for the first time. We're currently verifying that
data is being submitted. You should see your Agent show up in Datadog shortly
at:

    https://app.datadoghq.com/infrastructure

Waiting for metrics.................................

Your Agent is running and functioning properly. It will continue to run in the
background and submit metrics to Datadog.

If you ever want to stop the Agent, run:

    sudo /etc/init.d/datadog-agent stop

And to run it again run:

    sudo /etc/init.d/datadog-agent start

We installed the Datadog on Ubuntu desktop. Now, we want to install it on other server nodes running CentOS and Ubuntu. For these, we'll prepend "DD_INSTALL_ONLY=true" to the command since we don't want it to start automatically after the installation:

$ DD_INSTALL_ONLY=true DD_API_KEY=d9c0b1fa9ef2b1402b08323d8b235529 bash -c "$(curl -L https://raw.githubusercontent.com/DataDog/dd-agent/master/packaging/datadog-agent/source/install_agent.sh)"
...
* DD_INSTALL_ONLY environment variable set: the newly installed version of the agent
will not start by itself. You will have to do it manually using the following
command:

    sudo /etc/init.d/datadog-agent restart

Let's start it on CentOS node:

$ sudo /etc/init.d/datadog-agent restart
...
Restarting datadog-agent (via systemctl):                  [  OK  ]

On another Ubuntu node:

$ DD_INSTALL_ONLY=true DD_API_KEY=d9c0b1fa9ef2b1402b08323d8b235529 bash -c "$(curl -L https://raw.githubusercontent.com/DataDog/dd-agent/master/packaging/datadog-agent/source/install_agent.sh)"
...* Adding your API key to the Agent configuration: /etc/dd-agent/datadog.conf


* DD_INSTALL_ONLY environment variable set: the newly installed version of the agent
will not start by itself. You will have to do it manually using the following
command:

    sudo invoke-rc.d datadog-agent restart

$ sudo /etc/init.d/datadog-agent restart
[ ok ] Restarting datadog-agent (via systemctl): datadog-agent.service.

Accessing the Web UI

Type https://app.datadoghq.com/infrastructure into a browser. We can see the initial dashboard for our three nodes, and it shows us our "Infrastructure List" under "Infrastructure" menu:

We can select "Host Map" instead of "List":

$YourInfractureMap.png$

Metrics Explorer:

HipChat integration

HipChat is hosted group chat and IM for companies and teams. We want to integrate with HipChat to:

be notified when someone posts on our stream
be notified when a metric alert is triggered

Here is the integration page:

On the page, type in the token from HipCHat and set the name of the room. Then, select "install integration".

Here is the the chat room:

We can post a new comment on the Event pane:

We'll get popup notice for the post, and we can check if from our HipChat room.

Pagerduty integration

PagerDuty adds Phone and SMS alerting to our existing monitoring tools.

Integrate with PagerDuty to:

Trigger and resolve incidents from our stream by mentioning @pagerduty in our post
See incidents and escalations in our stream as they occur
Get a daily reminder of who's on-call

Check Datadog Integration Guide

After giving email/password, select "Authorize Integration"

Click "Finish Integration"

We also need to fill in the PagerDuty service:

Datadog will use these services to connect to PagerDuty.

Click "Update Configuration"

To test the integration, simply make a post with @pagerduty (i.e. @pagerduty Yay, this is my first Datadog alert.) in our newsfeed:

This will create an incident in PagerDuty and will show up in our Datadog newsfeed:

AWS integration

Provide AWS Access Key and AWS Secret Key, then click "Install Integration"

Setting up the Datadog integration with Amazon Web Services requires configuring role delegation using AWS IAM and it is the recommended way of configuration.

Although we could give Datadog access to an IAM user and its long-term credentials in our AWS account, we should choose instead to go with the highly recommended best practice of using an IAM role. An IAM role provides a mechanism to allow a third party to access our AWS resources without needing to share long-term credentials (for example, an IAM user's access key).

We can use an IAM role to establish a trusted relationship between our AWS account and the account belonging to Datadog. After this relationship is established, one of Datadog's IAM users or applications in the trusted AWS account can call the AWS STS AssumeRole API to obtain temporary security credentials that can then be used to access AWS resources in our account.

So, let's use IAM user/role (Role Delegation) as shown below (follow the instruction from http://docs.datadoghq.com/integrations/aws/):

The steps were:

Open the AWS Integration tile.
Select the Role Delegation tab.
Enter our AWS Account ID which can be found in the ARN of the newly created role. Then enter the name of the role we just created. Finally enter the External ID we specified above.
Choose the services we want to collect metrics for on the left side of the dialog. We can optionally add tags to all hosts and metrics. Also if we want to only monitor a subset of EC2 instances on AWS, tag them and specify the tag in the limit textbox here.

After setting up AWS Account, click "Install Integration".

Elasticsearch integration

ssh into the instance where the Elasticsearch is installed:

$ /etc/dd-agent/conf.d
$ sudo cp elastic.yaml.example elastic.yaml

Make sure the following line is not commented:

- url: http://localhost:9200

Then, restart dd agent:

$ sudo /etc/init.d/datadog-agent restart

Click "Install Integration"

So far, we integrated the following:

MySQL integration

First, copy the sample:

$ sudo cp mysql.yaml.example mysql.yaml

/etc/dd-agent/conf.d/mysql.yaml:

instances:
  - server: 127.0.0.1
    user: datadog
    pass: mypassword

Note that we may want to use 127.0.0.1 instead of localhost.

On mysql console:

mysql> create user 'datadog'@'127.0.0.1' identified by 'password';
mysql> grant all privileges on * . * to 'datadog'@'127.0.0.1';
mysql> flush privileges;

mysql> select user, host from mysql.user;
+----------------+-----------+
| user           | host      |
+----------------+-----------+
| datadog        | 127.0.0.1 |

Then, restart dd-agent:

$ sudo /etc/init.d/datadog-agent restart

Nginx integration

Ref: http://docs.datadoghq.com/integrations/nginx/

The default agent checks require the "NGINX stub status module", which is not compiled by default. In debian/ubuntu, this module is enabled in the nginx-extras package. To check if the version of nginx has the stub status module support compiled in, we can run:

$ nginx -V |& grep http_stub_status_module
configure arguments: --with-cc-opt='-g -O2 -fPIE -fstack-protector-strong 
-Wformat -Werror=format-security -
...
--with-http_stub_status_module 
 --with-threads

/etc/dd-agent/conf.d/nginx.yaml

- nginx_status_url: http://localhost/nginx_status/

If we see some output with configure arguments: and lots of options, then we have it enabled. Once we have a status-enabled version of nginx, we can set up a URL for the status module:

server {
  listen 80;
  server_name localhost;

  access_log off;
  allow 127.0.0.1;
  deny all;

  location /nginx_status {
    stub_status on;
  }
}

If we're using name based virtualhost, we need to modify all conf files for each domain!

Then, restart datadog agent:

$ sudo /etc/init.d/datadog-agent restart

Apache integration

Ref: http://docs.datadoghq.com/integrations/apache/ and How to collect Apache performance metrics.

In order to collect metrics from Apache, we need to enable the status module and make sure that ExtendedStatus is on.

Let's check if mod_status is installed on our Apache server:

$ grep -irn status_module /etc/httpd
/etc/httpd/conf.modules.d/00-base.conf:55:LoadModule status_module modules/mod_status.so

Apache conf file:

<Location /server-status>
    SetHandler server-status
    Require local
    Require ip 45.79.90.218
</Location>

After we're done making changes, save and exit. We can check our configuration file for errors with the following command:

$ apachectl configtest

Perform a graceful restart to apply the changes without interrupting live connections (apachectl -k graceful or service apache2 graceful):

$ sudo service httpd graceful

/etc/dd-agent/conf.d/apache.yaml

instances:
  - apache_status_url: http://localhost/server-status?auto

Then, restart datadog agent:

$ sudo /etc/init.d/datadog-agent restart
Restarting datadog-agent (via systemctl):                  [  OK  ]

We can validate the integration by running the agent info command:

$ sudo /etc/init.d/datadog-agent info

APM - Reporting traces

We can enable APM by modifying the Datadog Agent configuration file /etc/dd-agent/datadog.conf (see Report your first traces and Datadog Trace Client):

[Main]
# Enable the trace agent.
apm_enabled: true

Then, we need to restart the Agent:

$ sudo /etc/init.d/datadog-agent restart
[ ok ] Restarting datadog-agent (via systemctl): datadog-agent.service.

Depending on the language, we may have to install different tracing client.

Python client: ddtrace is Datadog's Python tracing client. It is used to trace requests as they flow across web servers, databases and microservices so developers have great visibility into bottlenecks and troublesome requests.
```
$ pip install ddtrace
```

Ruby client: Install the tracing client, adding the following gem in Gemfile:
```
source 'https://rubygems.org'
# tracing gem
gem 'ddtrace'
```
If we're not using Bundler to manage our dependencies, we can install ddtrace with:
```
$ gem install ddtrace
```

Now we want to enable integrations with our app, and the easiest way to get started with the tracing client is to instrument our web application. Follow the relevant instructions for the integrations.

APM - Flask app

Here is a sample Flask app (hello.py) doing "Hello World!":

from flask import Flask
import blinker as _

from ddtrace import tracer
from ddtrace.contrib.flask import TraceMiddleware

app = Flask(__name__)

traced_app = TraceMiddleware(app, tracer, service="my-flask-app")

@app.route("/")
def hello():
    return "Hello World!"

if __name__ == "__main__":
    app.run()

Run the application:

$ python hello.py
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [20/Apr/2017 22:05:25] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [20/Apr/2017 22:05:26] "GET /favicon.ico HTTP/1.1" 404 -