AlertManager Integration with Prometheus

One day I got a call from one of my friend and he said to me that he is facing difficulties while setting up AlertManager with Prometheus. Then, I observed that most of the people face such issues while establishing a connection between AlertManager and receiver such as E-mail, Slack etc.

From there, I got motivation for writing this blog so AlertManager setup with Prometheus will be a piece of cake for everyone.

If you are new to AlertManager I would suggest you go through with our Prometheus blog.

What Actually AlertManager Is?

AlertManager is used to handle alerts for client applications (like Prometheus). It also takes care of alerts deduplicating, grouping and then routes them to different receivers such as E-mail, Slack, Pager Duty.

In this blog, we will only discuss on Slack and E-mail receivers.

AlertManager can be configured via command-line flags and configuration file. While command line flags configure system parameters for AlertManager,  the configuration file defines inhibition rules, notification routing, and notification receivers.

Architecture

Here is a basic architecture of AlertManager with Prometheus.

This is how Prometheus architecture works:-

  • If you see in the above picture Prometheus is scraping the metrics from its client application(exporters).
  • When the alert is generated then it pushes it to the AlertManager, later AlertManager validates the alerts groups on the basis of labels.
  • and then forward it to the receivers like Email or Slack.

If you want to use a single AlertManager for multiple Prometheus server you can also do that. Then architecture will look like this:-

Installation

Installation part of AlertManager is not a fancy thing, we just simply need to download the latest binary of AlertManager from here.

$ cd /opt/
$ wget https://github.com/prometheus/alertmanager/releases/download/v0.11.0/alertmanager-0.11.0.linux-amd64.tar.gz

After downloading, let’s extract the files.

$ tar -xvzf alertmanager-0.11.0.linux-amd64.tar.gz

So we can start AlertManager from here as well but it is always a good practice to follow Linux directory structure.

$ mv alertmanager-0.11.0.linux-amd64/alertmanager /usr/local/bin/

 Configuration

Once the tar file is extracted and binary file is placed at the right location then the configuration part will come. Although AlertManager extracted directory contains the configuration file as well but it is not of our use. So we will create our own configuration. Let’s start by creating a directory for configuration.

$ mkdir /etc/alertmanager/

Then the configuration file will take place.

$ vim /etc/alertmanager/alertmanager.yml

The configuration file for Slack will look like this:-

global:


# The directory from which notification templates are read.
templates:
- '/etc/alertmanager/template/*.tmpl'

# The root route on which each incoming alert enters.
route:
  # The labels by which incoming alerts are grouped together. For example,
  # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
  # be batched into a single group.
  group_by: ['alertname', 'cluster', 'service']

  # When a new group of alerts is created by an incoming alert, wait at
  # least 'group_wait' to send the initial notification.
  # This way ensures that you get multiple alerts for the same group that start
  # firing shortly after another are batched together on the first
  # notification.
  group_wait: 3s

  # When the first notification was sent, wait 'group_interval' to send a batch
  # of new alerts that started firing for that group.
  group_interval: 5s

  # If an alert has successfully been sent, wait 'repeat_interval' to
  # resend them.
  repeat_interval: 1m

  # A default receiver
  receiver: mail-receiver

  # All the above attributes are inherited by all child routes and can
  # overwritten on each.

  # The child route trees.
  routes:
  - match:
      service: node
    receiver: mail-receiver

    routes:
    - match:
        severity: critical
      receiver: critical-mail-receiver

  # This route handles all alerts coming from a database service. If there's
  # no team to handle it, it defaults to the DB team.
  - match:
      service: database
    receiver: mail-receiver
    routes:
    - match:
        severity: critical
      receiver: critical-mail-receiver

receivers:
- name: 'mail-receiver'
  slack_configs:
  - api_url:  https://hooks.slack.com/services/T2AGPFQ9X/B94D2LHHD/jskljaganauheajao2
    channel: '#prom-alert'

   - name: 'critical-mail-receiver'
  slack_configs: 
  
  - api_url:   https://hooks.slack.com/services/T2AGPFQ9X/B94D2LHHD/abhajkaKajKaALALOPaaaJk  channel: '#prom-alert'

You just have to replace the channel name and api_url of the Slack with your information.

The configuration file for E-mail will look something like this:-

global:

templates:
- '/etc/alertmanager/*.tmpl'
# The root route on which each incoming alert enters.
route:
  # default route if none match
  receiver: alert-emailer

  # The labels by which incoming alerts are grouped together. For example,
  # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
  # be batched into a single group.
  # TODO:
  group_by: ['alertname', 'priority']

  # All the above attributes are inherited by all child routes and can
  # overwritten on each.

receivers:
- name: alert-emailer
  email_configs:
  - to: 'receiver@example.com'
    send_resolved: false
    from: 'sender@example.com'
    smarthost: 'smtp.example.com:587'
    auth_username: 'sender@example.com'
    auth_password: 'IamPassword'
    auth_secret: 'sender@example.com'
    auth_identity: 'sender@example.com'

In this configuration file, you need to update the sender and receiver mail details and the authorization password of the sender.

Once the configuration part is done we just have to create a storage directory where AlertManger will store its data.

$ mkdir /var/lib/alertmanager

Then only last piece which will be remaining is my favorite part i.e creating service 🙂

$ vi /etc/systemd/system/alertmanager.service

The service file will look like this:-

[Unit]
Description=AlertManager Server Service
Wants=network-online.target
After=network-online.target

[Service]
User=root
Group=root
Type=Simple
ExecStart=/usr/local/bin/alertmanager \
    --config.file /etc/alertmanager/alertmanager.yml \
    --storage.tsdb.path /var/lib/alertmanager

[Install]
WantedBy=multi-user.target

Then reload the daemon and start the service

$ systemctl daemon-reload
$ systemctl start alertmanager
$ systemctl enable alertmanager

Now you are all set to fire up your monitoring and alerting. So just take a beer and relax until Alert Manager notifies you for alerts. All the best!!!!

Its not you Everytime, sometimes issue might be at AWS End

Today an issue reported to me that website of our client was loading very slow which was hosted on AWS Windows server and the same website was loading fine when accessed from outside AWS network,I just felt like might be a regular issue but it all together took me to an inside out of the network troubleshooting.

Initially, we checked for SSL certificate expiry, which was not the case, so below are the Two steps which we used to troubleshoot the issue:

Troubleshooting through Browser via Web developer Network tool

In browser we checked which part of code was taking time to load using Network option in developer tools:

  • Select web developer tools in firefox
  • Then select network

We identified one of the GET calls was taking time to load.
Then when this thing was reported to AWS support team they provided further analysis of this. We can save the report as (.HAR) file which tells us below things:

  • How long it takes to fetch DNS information
  • How long each object takes to be requested
  • How long it takes to connect to the server
  • How long it takes to transfer assets from the server to the browser of each object.

Troubleshooting using Traceroute

Then we tried to troubleshoot the AWS network flow using “tracert ” with below output:

Tracing route to example.gov [151.x.x.x] over a maximum of 15 hops:

1 <1 ms <1 ms <1 ms 10.x.x.x

2 * * * Request timed out.

3 * * * Request timed out.

4 * * * Request timed out.

5 * * * Request timed out.

6 * * * Request timed out.

7 <1 ms <1 ms <1 ms 100.x.x.x

8 <1 ms <1 ms 1 ms 52.x.x.x

9 * * * Request timed out.

10 2 ms 1 ms 1 ms example.net [67.x.x.x]

11 2 ms 2 ms 2 ms example.net [67.x.x.x]

12 2 ms 2 ms 2 ms example.net [205.x.x.x]

13 3 ms 3 ms 2 ms 63.x.x.x

14 3 ms 3 ms 3 ms 198.x.x.x

15 4 ms 4 ms 4 ms example.net [63.x.x.x]

And when this was reported to AWS team that RTO from 2-6 we were getting was due to connectivity with internal AWS network which needs to be byepass and was not an issue as packet still reached the next server within 1ms.

Traceroute gives an insight to your network problem.

  • The entire path that a packet travels through
  • Names and identity of routers and devices in your path
  • Network Latency or more specifically the time taken to send and receive data to each devices on the path.

Solution provided by AWS Team

After all the Razzle-Dazzle they just refreshed the network from their end and there was no more website latency after that while accessing from AWS internal network.

Tool recommended by AWS Support team for Network troubleshooting if the issue arises in future:

Wireshark along with .har file using network in web-developer tools from browser.

Wireshark is a network packet analyzer. A network packet analyzer will try to capture network packets and tries to display that packet data as detailed as possible.
You could think of a network packet analyzer as a measuring device used to examine what’s going on inside a network cable, just like a voltmeter is used by an electrician to examine what’s going on inside an electric cable (but at a higher level, of course).
In the past, such tools were either very expensive, proprietary, or both. However, with the advent of Wireshark, all that has changed.

Wireshark is perhaps one of the best open source packet analyzers available today.

Features

The following are some of the many features Wireshark provides:

  • Available for UNIX and Windows.
  • Capture live packet data from a network interface.
  • Open files containing packet data captured with tcpdump/WinDump, Wireshark, and a number of other packet capture programs.
  • Import packets from text files containing hex dumps of packet data.
  • Display packets with very detailed protocol information.
  • Save packet data captured.
  • Export some or all packets in a number of capture file formats.
  • Filter packets on many criteria.
  • Search for packets on many criteria.
  • Colorize packet display based on filters.
  • Create various statistics.

… and a lot more!

Best Practices of Ansible Role

I have written about Ansible Roles in my career. But when I talk about the “Best Practice of writing an Ansible Role”, half of them were not following the best practices.
When I started writing this blog, I had only limited knowledge of Ansible Roles and about the practices being followed. But reading more on Ansible roles has helped in enhancing my knowledge.
Without the proper understanding of the Architecture of Ansible Role, I was incapable of enjoying all the functionality for writing an Ansible Role. Earlier, I used “command” and “shell” modules for writing an Ansible Role. Here, in this blog, I’ve discussed the best practices of Ansible Role. Let’s
read these in detail.
 

Continue reading “Best Practices of Ansible Role”

Git Inside Out

Git Inside-Out
Man Wearing Black and White Stripe Shirt Looking at White Printer Papers on the Wall

Git is basically a file-system where you can retrieve your content through addresses. It simply means that you can insert any kind of data into git for which Git will hand you back a unique key you can use later to retrieve that content. We would be learning #gitinsideout through this blog

The Git object model has three types: blobs (for files), trees (for folder) and commits. 

Objects are immutable (they are added but not changed) and every object is identified by its unique SHA-1 hash
A blob is just the contents of a file. By default, every new version of a file gets a new blob, which is a snapshot of the file (not a delta like many other versioning systems).
A tree is a list of references to blobs and trees.
A commit is a reference to a tree, a reference to parent commit(s) and some decoration (message, author).

Then there are branches and tags, which are typically just references to commits.

Git stores the data in our .git/objects directory. After initialising a git repository, it automatically creates .git/objects/pack and .git/objects/info with no regular file. After pushing some files, it would reflect in the .git/objects/ folder

OBJECT Blob

blob stores the content of a file and we can check its content by command

git cat-file -p

or git show

OBJECT Tree

A tree is a simple object that has a bunch of pointers to blobs and other trees – it generally represents the contents of a directory or sub-directory.

We can use git ls-tree to list the content of the given tree object

OBJECT Commit

The “commit” object links a physical state of a tree with a description of how we got there and why.

A commit is defined by tree, parent, author, committer, comment

All three objects ( blob,Tree,Commit) are explained in details with the help of a pictorial diagram.

Often we make changes to our code and push it to SCM. I was doing it once and made multiple changes, I was thinking it would be great if I could see the details of changes through local repository itself instead to go to a remote repository server. That pushed me to explore Git more deeply.

I just created a local remote repository with the help of git bare repository. Made some changes and tracked those changes(type, content, size etc).

Below example will help you understand the concept behind it.

Suppose we have cloned a repository named kunal:

Inside the folder where we have cloned the repository, go to the folder kunal then:

cd kunal/.git/

I have added content(hello) to readme.md and made many changes into the same repository as:

adding README.md

updating Readme.md

adding 2 files modifying one

pull request

commit(adding directory).

Go to the refer folder inside .git and take the SHA value for the master head:

This commit object we can explore further with the help of cat-file which will show the type and content of tree and commit object:

Now we can see a tree object inside the tree object. Further, we can see the details for the tree object which in turn contains a blob object as below:

Below is the pictorial representation for the same:

More elaborated representation for the same :

Below are the commands for checking the content, type and size of objects( blob, tree and commit)

kunal@work:/home/git/test/kunal# cat README.md
hello

We can find the details of objects( size,type,content) with the help of #git cat-file

git-cat-file:- Provide content, type or size information for repository objects

You an verify the content of commit object and its type with git cat-file as below:

kunal@work:/home/git/test/kunal/.git # cat logs/refs/heads/master

Checking the content of a blob object(README.md, kunal and sandy)

As we can see first one is adding read me , so it is giving null parent(00000…000) and its unique SHA-1 is 912a4e85afac3b737797b5a09387a68afad816d6

Below are the details that we can fetch from above SHA-1 with the help of git cat-file :

Consider one example of merge:

Created a test branch and made changes and merged it to master.

Here you can notice we have two parents because of a merge request

You can further see the content, size, type of repository #gitobjects like:

Summary

This is pretty lengthy article but I’ve tried to make it as transparent and clear as possible. Once you work through the article and understand all concepts I showed here you will be able to work with Git more effectively.

This explanation gives the details regarding tree data structure and internal storage of objects. You can check the content (differences/commits)of the files through local .git repository which stores each object with unique  SHA  hash. This would clear basically the internal working of git.
Hopefully, this blog would help you in understanding the git inside out and helps in troubleshooting things related to git.

My stint with Runc vulnerability

Today I was given a task to set up a new QA environment. I said no issue should be done quickly as we use Docker, so I just need to provision VM run the already available QA ready docker image on this newly provisioned VM. So I started and guess what Today was not my day. I got below error while running by App image.

docker: Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused “process_linux.go:293: copying bootstrap data to pipe caused \”write init-p: broken pipe\””: unknown.

I figured out my Valentine’s Day gone for a toss. As usual I took help of Google God to figure out what this issue is all about, after few minutes I found out a blog pretty close to issue that I was facing

https://medium.com/@dirk.avery/docker-error-response-from-daemon-1d46235ff61d

Bang on I got the issue identified. There is a new runc vulnerability identified few days back.

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-5736

The fix of this vulnerability was released by Docker on February 11, but the catch was that this fix makes docker incompatible with 3.13 Kernel version.

While setting up QA environment I installed latest stable version of docker 18.09.2 and since the kernel version was 3.10.0-327.10.1.el7.x86_64 thus docker was not able to function properly.

So as suggested in the blog I upgraded the Kernel version to 4.x.

rpm –import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
yum repolist
yum –enablerepo=elrepo-kernel install kernel-ml
yum repolist all
awk -F\’ ‘$1==”menuentry ” {print i++ ” : ” $2}’ /etc/grub2.cfg
grub2-set-default 0
grub2-mkconfig -o /boot/grub2/grub.cfg
reboot

And here we go post that everything is working like a charm.

So word of caution to every even
We have a major vulnerability in docker CVE-2019-5736, for more details go through the link

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-5736

As a fix, upgrade your docker to 18.09.2, as well make sure that you have Kernel 4+ as suggested in the blog.

https://docs.docker.com/engine/release-notes/

Now I can go for my Valentine Party 👫