Kafka within EFK Monitoring

Today’s world is entirely internet-driven, be it in any field, we can get any product of our choice with one click.

Talking about e-commerce more in DevOps terms, the entire application/website is based on microservice architecture i.e. distributing a bulk application into smaller services to increase scalability, manageability & more process driven.

Hence, to maintain smaller services one of the important aspects is to enable their Monitoring.

One such commonly known stack is, EFK stack i.e. (Elasticsearch, Fluentd, Kibana) along with Kafka.

Kafka is basically an open-source event streaming platform and is currently used by many companies.

Question: Why use Kafka within EFK monitoring?

Answer: Well this is the first question that strikes many minds hence, in this blog we’ll focus on why to use Kafka, what are its benefits and how to integrate it with the EFK stack.

Interesting right? 🙂 let’s begin -:

So, while traveling we’ve seen crossroads enabled by Traffic lights or Traffic policemen to streamline the traffic, as over crossroads traffic from 4 directions meet.

So, what do Traffic lights or policemen do? They streamline the traffic by allowing one-way traffic and at the same time stopping in other directions while they wait for their turn.

Talking in technical terms in the above scenario the incoming traffic is streamlined by withholding it for some time or say creating a small buffer, isn’t it?

Kafka also does similar things, imagine approx 300 applications sending logs directly to Elasticsearch which as a result may choke it and during traffic, time scaling up elasticsearch or adding more data nodes isn’t a good solution as it becomes unstable due to re-sharding.

Introducing kafka, breaks this incoming traffic as it acts as a buffer and sends streamlined chunks to Elasticsearch.

Let’s understand this with a Block Diagram-:

Not to worry, I’ll explain each and every block with configurations 🙂

Block 1 -: This block refers to the containers or instances within which application logs are populated along with the td-agent service running (to export the desired log path to Kafka). Td-agent is a stable distribution package of fluentd maintained by Treasure Data and Cloud Native computing foundation. Basically, it’s a data collection daemon. It collects logs from various data sources (in our case application) and uploads/exports them to treasure data.

Installation guide

Within the td-agent conf below configuration is done

<source>
   @type tail
   read_from_head true
   path <path_of_log_file>
   tag <tag_name>
   format json
   keep_time_key true
   time_format 	<time_format_of_logs>
   pos_file < pos_file_location >
</source>

<match <tag_name> >
  @type kafka_buffered
     output_include_tag true
     brokers <kafka_hostname:port>
     default_topic <kafka_topic_name>
     output_data_type json
     buffer_type file
     buffer_path <buffer_path_location>
     buffer_chunk_limit 10m
     buffer_queue_limit 256
     buffer_queue_full_action drop_oldest_chunk
</match>

The <source> block is dedicated to log configurations, like -:

path – Path of log file location

tag – Tag name for logs, its user defined

format – logs format e.g. json, text etc.

Keep_time_key – Time key to keep from logs e.g. True or false

Time_format – Time format of logs e.g. %d/%b/%Y:%H:%M:%S

Pos_file – Position file location, user-defined

Similarly, <match> block is dedicated to destination i.e. where to send these logs -:

@type – Type of buffer, e.g kafka or elasticsearch

Output_include_tag – tag_name include as mentioned in source block

Brokers – Kafka dns name with port

default_topic – Kafka Topic into which logs will be exported

output_data_type – Output logs format e.g. json

Buffer_type – Buffer type e.g. file

buffer_path – Path of buffer file, user-defined

Buffer_chunk_limit – Chunk limit of buffer

buffer_queue_limit – Queue limit of buffer

Buffer_queue_full_action – To keep buffer file rotation

For more config. parameters please refer – link

Block 2 -: Is Kafka server, where kafka service is setup

Kafka uses zookeeper for self-balancing, depending upon your infra zookeeper can be on the same server or separate (separate in case of production setup)

wget http://mirror.fibergrid.in/apache/kafka/0.10.2.0/kafka_2.12-0.10.2.0.tgz

tar -xzf kafka_2.12-0.10.2.0.tgz

Starting Zookeeper -:

Zookeeper needs to be started first. To do the same a convenience script comes in handy with the Kafka package to start the zookeeper single node standalone instance and further configurations need to be added within zookeeper.properties file.

vi .bashrc

export KAFKA_HEAP_OPTS="-Xmx500M -Xms500M"

The value needs to be 50% of the total RAM on the instance.

source .bashrc

Start Zookeeper by the following command in the background using nohup and divert its logs in zookeeper-logs file.

cd kafka_2.12-0.10.2.0

nohup bin/zookeeper-server-start.sh config/zookeeper.properties > ~/zookeeper-logs &

Starting Kafka -:

cd kafka_2.12-0.10.2.0

nohup bin/kafka-server-start.sh config/server.properties > ~/kafka-logs &

To stop any of them use below commands -:

bin/kafka-server-stop.sh

bin/zookeeper-server-stop.sh

Refer to official documentation

Block 3 -: td-agent, used as forwarder

Now, we’ve logs within kafka topics. But we need a mechanism to pull these logs and further export it to Elasticsearch

So, as td-agent was used to pick application logs and send them to kafka same way here td-agent will be configured as forwarders i.e. to pull logs from kafka and send it to Elasticsearch

<source>

  @type kafka_group

  brokers <kafka_dns:port>

  consumer_group  <consumer_group_kafka>

  topics <kafka_topic_name>

</source>

<match <kafka_topic_name> >

    @type forest

    subtype elasticsearch

      <template>

        host <ElasticSearch IP>

        port <Elasticsearch Port>

        user           <ES_username>

        password       <ES_password>

        logstash_prefix <prefix name>

        logstash_format true

        include_tag_key true

        tag_key tag_name

      </template>

</match>

Again, the source and match block will be updated with similar values as stated before.

This time, the source will be configured to take logs from kafka and match to forward them to Elasticsearch

Consumer_group – It’s a group of consumers that share the same group id. When a topic is consumed by consumers in the same group, every record will be delivered to only one consumer.

forest – Creates sub-plugin instance of an output plugin dynamically per tag, from template configurations

Logstash_prefix – Will be index_name to which logs will be sent and viewed inside kibana

Block 4 -: Elasticsearch

Setup can be done following below official document from Elasticsearch

Elasticsearch setup over ubuntu

Block 5: Kibana setup

Kibana setup

You can configure nginx to make kibana available over port 80 or 443

Refer link for Nginx configuration.

So, yes now the entire EFK stack is set up with Kafka, and in the same way, it can be configured over standalone mode ( for self-learning) or over different servers for production setup.

NOTE: – The Elasticsearch and kibana setup is the same only td-agent (collector & forwarder) and Kafka configuration is where the magic happens.

Happy Learning …

Blog Pundits: Naveen Verma and Sandeep Rawat

OpsTree is an End-to-End DevOps Solution Provider.

Connect with Us