Automation Archives - Page 8 of 13

Kafka within EFK Monitoring

Introduction

Today’s world is completely internet driven. Whether it is shopping, banking, or entertainment, almost everything is available with a single click.

From a DevOps perspective, modern e-commerce and enterprise applications are usually built using a microservices architecture. Instead of running one large monolithic application, the system is divided into smaller, independent services. This approach improves scalability, manageability, and operational efficiency.

However, managing a distributed system also increases complexity. One of the most critical requirements for maintaining microservices is effective monitoring and log management.

A commonly used monitoring and logging stack is the EFK stack, which includes Elasticsearch, Fluentd, and Kibana. In many production environments, Kafka is also introduced into this stack to handle log ingestion more reliably.

Kafka is an open-source event streaming platform and is widely used across organizations for handling high-throughput data streams.

This naturally raises an important question.

Why should Kafka be used along with the EFK stack?

In this blog, we will explore why Kafka is introduced, what benefits it brings, and how it integrates with the EFK stack.

Let’s get started.

Why Kafka Is Needed in the EFK Stack

While traveling, we often see crossroads controlled by traffic lights or traffic police. At a junction where traffic flows from multiple directions, these controls ensure smooth movement by allowing traffic from one direction while holding others temporarily.

In technical terms, traffic is regulated by buffering and controlled flow.

Kafka plays a very similar role in log management.

Imagine hundreds of applications sending logs directly to Elasticsearch. During peak traffic, Elasticsearch may become overwhelmed. Scaling Elasticsearch during heavy ingestion is not always a good solution because frequent scaling and re-sharding can cause instability.

Kafka solves this problem by acting as a buffer layer. Instead of pushing logs directly to Elasticsearch, logs are first sent to Kafka. Kafka then delivers them in controlled, manageable batches to Elasticsearch.

High-Level Architecture Overview

The complete flow consists of the following blocks.

Application containers or instances
Kafka
Fluentd forwarder
Elasticsearch
Kibana

Each block is explained below with configurations.

Block 1 Application Logs and td-agent Configuration

This block represents application containers or EC2 instances where logs are generated. The td-agent service runs alongside the application to collect logs and forward them to Kafka.

td-agent is a stable distribution of Fluentd maintained by Treasure Data and the Cloud Native Computing Foundation. It is a data collection daemon that gathers logs from various sources and forwards them to destinations such as Kafka or Elasticsearch.

td-agent Configuration

Use the following configuration inside the td-agent configuration file.

Source Configuration

The source block defines how logs are collected.

path specifies the log file location
tag is a user-defined identifier for logs
format defines the log format such as json or text
keep_time_key preserves the original timestamp
time_format defines the timestamp pattern
pos_file tracks the read position of logs

Match Configuration to Kafka

The match block defines where logs are sent.

kafka_buffered ensures reliable delivery
brokers defines Kafka host and port
default_topic is the Kafka topic for logs
buffer settings control local buffering and backpressure

Block 2 Kafka Setup

Kafka acts as the central buffering and streaming layer.

Kafka uses Zookeeper for coordination and self-balancing. In production setups, Zookeeper is usually deployed separately.

Download Kafka

Extract the Package

Starting Zookeeper

Zookeeper must be started before Kafka.

Update JVM heap size in the shell profile.

The heap size should be approximately 50 percent of the available system memory.

Reload the configuration.

Start Zookeeper in the background.

Starting Kafka

Stopping Services

For advanced configurations, always refer to the official Kafka documentation.

Block 3 td-agent as Kafka Consumer and Elasticsearch Forwarder

At this stage, logs are available in Kafka topics. The next step is to pull logs from Kafka and send them to Elasticsearch.

Here, td-agent is configured as a Kafka consumer and forwarder.

Kafka Source Configuration

consumer_group ensures distributed consumption
each log record is consumed by only one consumer

Match Configuration to Elasticsearch

Key concepts used here.

forest dynamically creates output instances per tag
logstash_prefix defines index naming in Elasticsearch
logs become visible in Kibana using this index

Block 4 Elasticsearch Setup

Elasticsearch acts as the storage and indexing layer.

Follow the official Elasticsearch documentation to install and configure Elasticsearch on Ubuntu or your preferred operating system.

Block 5 Kibana Setup

Kibana provides visualization and search capabilities on top of Elasticsearch.

Install Kibana using the official documentation.

You can configure Nginx to expose Kibana on port 80 or 443 for easier access.

Final Architecture Summary

With this setup, the complete EFK stack is integrated with Kafka.

Applications send logs to td-agent
td-agent pushes logs to Kafka
Kafka buffers and streams logs
td-agent forwarder consumes logs
Elasticsearch stores logs
Kibana visualizes logs

The same architecture can be used in standalone environments for learning or across multiple servers in production.

On-Premise Setup of Kubernetes Cluster using KubeSpray (Offline Mode) – PART 1

Today, most organizations are moving to Managed Services like EKS (Elastic Kubernetes Services), and AKS (Azure Kubernetes Services), for easier handling of the Kubernetes Cluster. With Managed Kubernetes we do not have to take care of our Master Nodes, cloud providers will be responsible for all Master Nodes and Worker Nodes, freeing up our time. We just need to deploy our Microservices over the Worker nodes. You can pay extra to achieve an uptime of 99.95%. Node repair ensures that a cluster remains healthy and reduces the chances of possible downtime. This is good in many cases but it makes it an expensive ordeal as AKS costs $0.10 per cluster per hour. You have to install upgrades for the VPC CNI yourself and also, install Calico CNI. There is no IDE extension for developing EKS code. it also creates a dependency on the particular Cloud Provider.

To skip the dependency on any Cloud Provider we have to create a Vanilla Kubernetes Cluster. This means we have to take care of all the components – all the Master and Worker Nodes of the Cluster by ourselves.

Here we got a scenario in which one of our client’s requirements was to set up a Kubernetes cluster over On-premises Servers, under the condition of no Internet connectivity. So I choose to perform the setup of the Kubernetes Cluster via Kubespray.

Why Kubespray?

Kubespray is a composition of Ansible playbooks, inventory,
provisioning tools, and domain knowledge for generic
OS/Kubernetes clusters configuration management tasks.
Kubespray provides a highly available cluster, composable
(choice of the network plugin for instance), supports most popular Linux distributions, and continuous integration tests.

Continue reading “On-Premise Setup of Kubernetes Cluster using KubeSpray (Offline Mode) – PART 1”

Deploying Prometheus and Grafana on Kubernetes

Monitoring a Kubernetes Cluster is the need of the hour for any application following a microservices architecture. There are a bunch of solutions that one can implement to monitor their Kubernetes workload and one of them is Prometheus and Grafana. This article will help you to deploy Prometheus and Grafana in your kubernetes cluster with the help of prometheus-operator.

But before setting up these components let’s understand a bit about each of them.

Prometheus

Prometheus is a pull-based open-source monitoring and alerting tool originally built by SoundCloud. It works on a time-series database and scrapes metrics at a given interval from HTTP endpoints. After Kubernetes, Prometheus joined the Cloud Native Computing Foundation in 2016 as the second hosted project.

Alertmanager

The Alertmanager takes care of alerts sent by alerting tools such as the Prometheus server. It handles grouping, silencing, and routing them to the correct receiver integration such as email, PagerDuty, Slack, etc. It also supports the inhibition of alerts.

Grafana

Grafana is the visual representation of metrics collected by a data source which in our case happens to be Prometheus. We can create or import dashboards for grafana which will make use of promQL to visually represent metrics collected by Prometheus.

How to Setup Consul through the OSM Ansible Role

Are you searching for service discovery or a service mesh tool for a distributed environment?

Did you find any with easy installation? Not yet!! Think fast….It’s just a piece of cake.YES! NO! Calm down because I got it !!!!

A few days back we got a requirement where we had to setup multiple services on multiple servers and in a cluster mode….So now the question arises how will the services be auto discovered? how will we get to know the health check of the service? and above all how to restrict users on different services. After a lot of research, I came across a tool named as consul. But now another stumbling block arises HOW TO SETUP IT?

Your answer might be just go ahead and download the binary on every server, if that’s what you’re thinking…then STOP! Because doing it manually on plenty of servers is time-consuming and also not an efficient way. So, I thought of using a configuration management tool that is none other than ansible. Then there were roles that were already present in the market but some have the hard coded encryption key, some were not generating the bootstrap token and also they were not easy to understand. None of the roles fulfilled the requirement.

So, I thought of creating an ansible role with features like, enabling ACL and generating a bootstrap token, and an encryption key with easy-to-understand language.

In this blog, I have explained the OT-OSM consul ansible role.

Without any delay let’s get started!!!

Now you might be thinking what is consul?

Terraform Version Upgrade

Starting the blog with the question – What is Terraform?

It can be called a magic wand that creates Infrastructure on the basis of the code that you write.

In Hashicorp’s words, “Terraform is an open-source Infrastructure as A Code software tool that enables you to safely and predictably create, change, and improve infrastructure.“