The closer you think you are, the less you’ll actually see

I hope you have seen the movie Now you see me, it has a famous quote The closer you think you are, the less you’ll actually see. Well, this blog is not about this movie but how I got stuck into an issue, because I was not paying attention and looking at the things closely and seeing less hence not able to resolve the issue.

There is a lot happening in today’s DevOps world. And HashiCorp has emerged out to be a big player in this game. Terraform is one of the open source tools to manage infrastructure as code. It plays well with most of the cloud provider. But with all these continuous improvements and enhancements there comes a possibility of issues as well. Below article is about such a scenario. And in case you have found yourself in the same trouble. You are lucky to reach the right page.
I was learning terraform and performing a simple task to launch an Ubuntu EC2 instance in us-east-1 region. For which I required the AMI Id, which I copied from the AWS console as shown in below screenshot.

Once I got the AMI Id, I tried to create the instance using terraform, below is the screenshot of the code

provider “aws” {
  region     = “us-east-1”
  access_key = “XXXXXXXXXXXXXXXXXX”
  secret_key = “XXXXXXXXXXXXXXXXXXX”
}
resource “aws_instance” “sandy” {
        ami = “ami-036ede09922dadc9b
        instance_type = “t2.micro”
        subnet_id = “subnet-0bf4261d26b8dc3fc”
}
I was expecting to see the magic of Terraform but what I got below ugly error.

Terraform was not allowing to spin up the instance. I tried couple of things which didn’t work. As you can see the error message didn’t give too much information. Finally, I thought of giving it a try by  doing same task via AWS web console. I searched for the same ubuntu AMI and selected the image as shown below. Rest of the things, I kept to default. And well, this time it got launched.

And it confused me more. Through console, it was working fine but while using Terraform it says not allowed. After a lot of hair pulling finally, I found the culprit which is a perfect example of how overlooking small things can lead to blunder.

Culprit

While copying the AMI ID from AWS console, I had copied the 64-bit (ARM) AMI ID. Please look carefully, the below screenshot

But while creating it through console I was selecting the default configuration which by is 64-bit(x86). Look at the below screenshot.

To explain it further, I tried to launch the VM with 64-bit (ARM) manually. And while selecting the AMI, I selected the 64-bit (ARM).

And here is the culprit. 64-bit(ARM) only supports a1 instance type

Conclusion

While launching the instance with the terraform, I tried using 64-bit (ARM) AMI ID mistakenly, primarily because for same AMI there are 2 AMI IDs and it is not very visible to eyes unless you pay special attention.

So folks, next time choosing an AMI ID keep it in mind what type of AMI you are selecting. It will save you a lot of time.

Best Practices for Writing a Shell Script

I am a lazy DevOps Engineer. So whenever I came across the same task more than 2 times I automate that. Although now we have many automation tools, still the first thing that hit into our mind for automation is bash or shell script.
After making a lot of mistakes and messy scripts :), I am sharing my experiences for writing a good shell script which not only looks good but also it will reduce the chances of error.

The things that every code should have:-
     – A minimum effort in the modification.
     – Your program should talk in itself, so you don’t have to explain it.
     – Reusability, Of course, I can’t write the same kind of script or program again and again.

I am a firm believer in learning by doing. So let’s create a problem statement for ourselves and then try to solve it via shell scripting with best practices :). I would like to have solutions in the comment section of this blog.


Problem Statement:- Write a shell script to install and uninstall a package(vim) depending on the arguments. The script should tell if the package is already installed. If no argument is passed it should print the help page.

So without wasting time let’s start for writing an awesome shell script. Here is the list of things that should always be taken care of while writing a shell script.

Lifespan of Script

If your script is procedural(each subsequent steps relies on the previous step to complete), do me a favor and add set -e in starting of the script so that the script exists on the first error. For example:-

#!/bin/bash
set -e # Script exists on the first failure
set -x # For debugging purpose

Functions

Ahha, Functions are my most favorite part of programming. There is a saying

Any fool can write code that a computer can understand. Good programmers write code that humans can understand. 

To achieve this always try to use functions and name them properly so that anyone can understand the function just by reading its name. Functions also provide the concept of re-usability. It also removes the duplicating of code, how? let’s see this

#!/bin/bash 
install_package() {
   local PACKAGE_NAME="$1"
   yum install "${PACKAGE_NAME}" -y
}
install_package "vim"

Command Sanity

Usually, scripts call other scripts or binary. When we are dealing with commands there are chances that commands will not be available on all systems. So my suggestion is to check them before proceeding.

#!/bin/bash  
check_package() {
    local PACKAGE_NAME="$1"
    if ! command -v "${PACKAGE_NAME}" > /dev/null 2>&1
    then
           printf "${PACKAGE_NAME} is not installed.\n"
    else
           printf "${PACKAGE_NAME} is already installed.\n"
    fi
}
check_package "vim"

Help Page

If you guys are familiar with Linux, you have certainly noticed that every Linux command has its help page. The same thing can be true for the script as well. It would be really helpful to include –help flag.

#!/bin/bash  
INITIAL_PARAMS="$*"
help_function() {
   {
        printf "Usage:- ./script <option>\n"
        printf "Options:\n"
        printf " -a ==> Install all base softwares\n"
        printf " -r ==> Remove base softwares\n"
    }
}
arg_checker() {
     if [ "${INITIAL_PARAMS}" == "--help" ]; then
            help_function
     fi
}
arg_checker

Logging

Logging is the most critical thing for everyone whether he is a developer, sysadmin or DevOps. Debugging seems to be impossible without logs. As we know most applications generate logs for understanding that what is happening with the application, the same practice can be implemented for shell script as well. For generating logs we have a bash utility called logger.

#!/bin/bash 
DATE=$(date)
declare DATE
check_file() {
     local FILENAME="$1"
     if ! ls "${FILENAME}" > /dev/null 2>&1
     then
            logger -s "${DATE}: ${FILENAME} doesn't exists"
     else
           logger -s "${DATE}: ${FILENAME} found successfuly"
     fi
}
check_file "/etc/passwd"

Variables

I like to name my variables in Capital letters with an underscore, In this way, I will not get confused with the function name and variable name. Never give a,b,c etc. as a variable name instead of that try to give a proper name to a variable as well just like functions.

#!/bin/bash 
# Use declare for declaring global variables
declare GLOBAL_MESSAGE="Hey, I am a global message"
# Use local for declaring local variables inside the function
message_print() {
    local LOCAL_MESSAGE="Hey, I am a local message"
    printf "Global Message:- ${GLOBAL_MESSAGE}\n"
    printf "Local Message:- ${LOCAL_MESSAGE}\n"
}
message_print

Cases

Cases are also a fascinating part of shell script. But the question is when to use this? According to me if your shell program is providing more than one functionality basis on the arguments then you should go for cases. For example:- If your shell utility provides the capability of installing and uninstalling the software.

#!/bin/bash  
print_message() {
    MESSAGE="$1"
    echo "${MESSAGE}"
}
case "$1" in
   -i|--input)
      print_message "Input Message"
      ;;
   -o|--output)
        print_message "Output Message"
        ;;
   --debug)
       print_message "Debug Message"
       ;;
    *)
      print_message "Wrong Input"
      ;;
esac

In this blog, we have covered functions, variables, the lifespan of a script, logging, help page, command sanity. I hope these topics help you in your daily life while using the shell script. If you have any feedback please let me know through comments.
Cheers Till the next Time!!!!

Can you integrate a GitHub Webhook with Privately hosted Jenkins No? Think again

Introduction

Triggering Jenkins builds automatically after every code commit is a core requirement in any continuous integration setup. Jenkins supports automated triggers through repository polling or event based notifications. While polling works, it consumes resources and introduces delays. Push based triggering through webhooks is far more efficient.

The difficulty appears when Jenkins is hosted inside a private network and the version control system is hosted on a cloud platform such as GitLab. In this scenario, GitLab cannot directly reach the Jenkins endpoint, making webhook based triggering difficult without exposing Jenkins publicly.

Webhook Relay solves this problem by acting as a secure bridge between GitLab and a privately hosted Jenkins server. This article explains how GitLab webhooks can trigger Jenkins jobs using Webhook Relay, based on real implementation experience.

Installing the Webhook Relay Agent

The Webhook Relay agent needs to run on the same machine where Jenkins is hosted or where Jenkins is reachable internally.

Below is the installation process shown as step based instructions.

# download the relay binary
curl -sSL https://storage.googleapis.com/webhookrelay/downloads/relay-linux-amd64 > relay
# make the binary executable
chmod +x relay

# move it to a directory in system path
sudo mv relay /usr/local/bin/relay

Webhook Relay service runs on a public endpoint, while this agent runs locally and listens for forwarded webhook events.

Creating a Webhook Relay Account

Create an account on the official Webhook Relay platform using the registration page shown below.

https://my.webhookrelay.com/register

After signing up, access to the Webhook Relay dashboard is provided, where authentication tokens can be generated.

Authenticating the Relay Agent

From the dashboard, create an access token. This generates a key and secret pair.

Use those credentials to authenticate the relay agent.

relay login \
-k
<your_token_key> \
-s <your_token_secret>

A successful login message confirms that the agent is connected and ready.

Creating the GitLab Repository

Create a GitLab repository for testing webhook integration. To keep the setup simple, a public repository can be used.

For reference, assume the repository name is WebhookProject.

Preparing Jenkins for GitLab Webhooks

Install the required Jenkins plugins from the plugin manager.

Navigate through the Jenkins dashboard to install
GitLab Plugin
GitLab Hook Plugin

Once installed, Jenkins becomes capable of receiving GitLab webhook events.

Creating the Jenkins Job

Create a new Jenkins job and configure it to pull source code from the GitLab repository.

Enable the option that allows Jenkins to be triggered by GitLab webhooks.

After enabling this option, Jenkins generates a webhook endpoint associated with the job. It usually follows this pattern.

http://<jenkins-host>:8080/project/<job-name>

Example shown for reference only.

Copy this endpoint, as it will be used in the forwarding configuration.

Forwarding Webhooks Using Webhook Relay

Start webhook forwarding by creating a relay bucket. This bucket acts as a routing channel between GitLab and Jenkins.

relay forward \
--bucket gitlab-jenkins \
http://<jenkins-host>:8080/project/<job-name>

Important note
Do not stop this process. Keep it running in the background.
Open a new terminal tab for further steps.

Once this command starts, the relay agent generates a public forwarding URL.

Configuring GitLab Webhook

Open the GitLab repository settings and navigate to the integrations or webhook section.

Paste the forwarding URL generated by Webhook Relay into the webhook URL field.

For initial testing, SSL verification can be disabled to avoid certificate related issues.

Save the webhook configuration.

Testing the Integration

Clone the GitLab repository locally and push a new commit.

git add .
git commit -m "test webhook trigger"
git push origin main

As soon as the push is completed, GitLab sends a webhook event. Webhook Relay receives it and forwards it to the local agent, which triggers the Jenkins job internally.

You can verify this by checking the Jenkins job build history.

Viewing Logs

GitLab webhook logs can be viewed from
Repository settings
Integrations
Webhook edit section

Webhook Relay logs are available in the Relay Logs section of the Webhook Relay dashboard.

Jenkins build logs confirm successful job execution.

Conclusion

Webhook Relay makes it possible to trigger Jenkins builds through GitLab webhooks even when Jenkins is hosted inside a private network. This approach avoids exposing Jenkins publicly while still enabling real time CI automation.

The same pattern works for GitHub and other webhook enabled platforms. With proper configuration, secure and efficient CI workflows can be achieved in restricted network environments.

AlertManager Integration with Prometheus

One day I got a call from one of my friend and he said to me that he is facing difficulties while setting up AlertManager with Prometheus. Then, I observed that most of the people face such issues while establishing a connection between AlertManager and receiver such as E-mail, Slack etc.

From there, I got motivation for writing this blog so AlertManager setup with Prometheus will be a piece of cake for everyone.

If you are new to AlertManager I would suggest you go through with our Prometheus blog.

What Actually AlertManager Is?

AlertManager is used to handle alerts for client applications (like Prometheus). It also takes care of alerts deduplicating, grouping and then routes them to different receivers such as E-mail, Slack, Pager Duty.

In this blog, we will only discuss on Slack and E-mail receivers.

AlertManager can be configured via command-line flags and configuration file. While command line flags configure system parameters for AlertManager,  the configuration file defines inhibition rules, notification routing, and notification receivers.

Architecture

Here is a basic architecture of AlertManager with Prometheus.

This is how Prometheus architecture works:-

  • If you see in the above picture Prometheus is scraping the metrics from its client application(exporters).
  • When the alert is generated then it pushes it to the AlertManager, later AlertManager validates the alerts groups on the basis of labels.
  • and then forward it to the receivers like Email or Slack.

If you want to use a single AlertManager for multiple Prometheus server you can also do that. Then architecture will look like this:-

Installation

Installation part of AlertManager is not a fancy thing, we just simply need to download the latest binary of AlertManager from here.

$ cd /opt/
$ wget https://github.com/prometheus/alertmanager/releases/download/v0.11.0/alertmanager-0.11.0.linux-amd64.tar.gz

After downloading, let’s extract the files.

$ tar -xvzf alertmanager-0.11.0.linux-amd64.tar.gz

So we can start AlertManager from here as well but it is always a good practice to follow Linux directory structure.

$ mv alertmanager-0.11.0.linux-amd64/alertmanager /usr/local/bin/

 Configuration

Once the tar file is extracted and binary file is placed at the right location then the configuration part will come. Although AlertManager extracted directory contains the configuration file as well but it is not of our use. So we will create our own configuration. Let’s start by creating a directory for configuration.

$ mkdir /etc/alertmanager/

Then the configuration file will take place.

$ vim /etc/alertmanager/alertmanager.yml

The configuration file for Slack will look like this:-

global:


# The directory from which notification templates are read.
templates:
- '/etc/alertmanager/template/*.tmpl'

# The root route on which each incoming alert enters.
route:
  # The labels by which incoming alerts are grouped together. For example,
  # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
  # be batched into a single group.
  group_by: ['alertname', 'cluster', 'service']

  # When a new group of alerts is created by an incoming alert, wait at
  # least 'group_wait' to send the initial notification.
  # This way ensures that you get multiple alerts for the same group that start
  # firing shortly after another are batched together on the first
  # notification.
  group_wait: 3s

  # When the first notification was sent, wait 'group_interval' to send a batch
  # of new alerts that started firing for that group.
  group_interval: 5s

  # If an alert has successfully been sent, wait 'repeat_interval' to
  # resend them.
  repeat_interval: 1m

  # A default receiver
  receiver: mail-receiver

  # All the above attributes are inherited by all child routes and can
  # overwritten on each.

  # The child route trees.
  routes:
  - match:
      service: node
    receiver: mail-receiver

    routes:
    - match:
        severity: critical
      receiver: critical-mail-receiver

  # This route handles all alerts coming from a database service. If there's
  # no team to handle it, it defaults to the DB team.
  - match:
      service: database
    receiver: mail-receiver
    routes:
    - match:
        severity: critical
      receiver: critical-mail-receiver

receivers:
- name: 'mail-receiver'
  slack_configs:
  - api_url:  https://hooks.slack.com/services/T2AGPFQ9X/B94D2LHHD/jskljaganauheajao2
    channel: '#prom-alert'

   - name: 'critical-mail-receiver'
  slack_configs: 
  
  - api_url:   https://hooks.slack.com/services/T2AGPFQ9X/B94D2LHHD/abhajkaKajKaALALOPaaaJk  channel: '#prom-alert'

You just have to replace the channel name and api_url of the Slack with your information.

The configuration file for E-mail will look something like this:-

global:

templates:
- '/etc/alertmanager/*.tmpl'
# The root route on which each incoming alert enters.
route:
  # default route if none match
  receiver: alert-emailer

  # The labels by which incoming alerts are grouped together. For example,
  # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
  # be batched into a single group.
  # TODO:
  group_by: ['alertname', 'priority']

  # All the above attributes are inherited by all child routes and can
  # overwritten on each.

receivers:
- name: alert-emailer
  email_configs:
  - to: 'receiver@example.com'
    send_resolved: false
    from: 'sender@example.com'
    smarthost: 'smtp.example.com:587'
    auth_username: 'sender@example.com'
    auth_password: 'IamPassword'
    auth_secret: 'sender@example.com'
    auth_identity: 'sender@example.com'

In this configuration file, you need to update the sender and receiver mail details and the authorization password of the sender.

Once the configuration part is done we just have to create a storage directory where AlertManger will store its data.

$ mkdir /var/lib/alertmanager

Then only last piece which will be remaining is my favorite part i.e creating service 🙂

$ vi /etc/systemd/system/alertmanager.service

The service file will look like this:-

[Unit]
Description=AlertManager Server Service
Wants=network-online.target
After=network-online.target

[Service]
User=root
Group=root
Type=Simple
ExecStart=/usr/local/bin/alertmanager \
    --config.file /etc/alertmanager/alertmanager.yml \
    --storage.tsdb.path /var/lib/alertmanager

[Install]
WantedBy=multi-user.target

Then reload the daemon and start the service

$ systemctl daemon-reload
$ systemctl start alertmanager
$ systemctl enable alertmanager

Now you are all set to fire up your monitoring and alerting. So just take a beer and relax until Alert Manager notifies you for alerts. All the best!!!!

Its not you Everytime, sometimes issue might be at AWS End

Today an issue reported to me that website of our client was loading very slow which was hosted on AWS Windows server and the same website was loading fine when accessed from outside AWS network,I just felt like might be a regular issue but it all together took me to an inside out of the network troubleshooting.

Initially, we checked for SSL certificate expiry, which was not the case, so below are the Two steps which we used to troubleshoot the issue:

Troubleshooting through Browser via Web developer Network tool

In browser we checked which part of code was taking time to load using Network option in developer tools:

  • Select web developer tools in firefox
  • Then select network

We identified one of the GET calls was taking time to load.
Then when this thing was reported to AWS support team they provided further analysis of this. We can save the report as (.HAR) file which tells us below things:

  • How long it takes to fetch DNS information
  • How long each object takes to be requested
  • How long it takes to connect to the server
  • How long it takes to transfer assets from the server to the browser of each object.

Troubleshooting using Traceroute

Then we tried to troubleshoot the AWS network flow using “tracert ” with below output:

Tracing route to example.gov [151.x.x.x] over a maximum of 15 hops:

1 <1 ms <1 ms <1 ms 10.x.x.x

2 * * * Request timed out.

3 * * * Request timed out.

4 * * * Request timed out.

5 * * * Request timed out.

6 * * * Request timed out.

7 <1 ms <1 ms <1 ms 100.x.x.x

8 <1 ms <1 ms 1 ms 52.x.x.x

9 * * * Request timed out.

10 2 ms 1 ms 1 ms example.net [67.x.x.x]

11 2 ms 2 ms 2 ms example.net [67.x.x.x]

12 2 ms 2 ms 2 ms example.net [205.x.x.x]

13 3 ms 3 ms 2 ms 63.x.x.x

14 3 ms 3 ms 3 ms 198.x.x.x

15 4 ms 4 ms 4 ms example.net [63.x.x.x]

And when this was reported to AWS team that RTO from 2-6 we were getting was due to connectivity with internal AWS network which needs to be byepass and was not an issue as packet still reached the next server within 1ms.

Traceroute gives an insight to your network problem.

  • The entire path that a packet travels through
  • Names and identity of routers and devices in your path
  • Network Latency or more specifically the time taken to send and receive data to each devices on the path.

Solution provided by AWS Team

After all the Razzle-Dazzle they just refreshed the network from their end and there was no more website latency after that while accessing from AWS internal network.

Tool recommended by AWS Support team for Network troubleshooting if the issue arises in future:

Wireshark along with .har file using network in web-developer tools from browser.

Wireshark is a network packet analyzer. A network packet analyzer will try to capture network packets and tries to display that packet data as detailed as possible.
You could think of a network packet analyzer as a measuring device used to examine what’s going on inside a network cable, just like a voltmeter is used by an electrician to examine what’s going on inside an electric cable (but at a higher level, of course).
In the past, such tools were either very expensive, proprietary, or both. However, with the advent of Wireshark, all that has changed.

Wireshark is perhaps one of the best open source packet analyzers available today.

Features

The following are some of the many features Wireshark provides:

  • Available for UNIX and Windows.
  • Capture live packet data from a network interface.
  • Open files containing packet data captured with tcpdump/WinDump, Wireshark, and a number of other packet capture programs.
  • Import packets from text files containing hex dumps of packet data.
  • Display packets with very detailed protocol information.
  • Save packet data captured.
  • Export some or all packets in a number of capture file formats.
  • Filter packets on many criteria.
  • Search for packets on many criteria.
  • Colorize packet display based on filters.
  • Create various statistics.

… and a lot more!