End-to-End Data Pipeline for Real-Time Stock Market Data!

Transform your data landscape with powerful, flexible, and flexible data pipelines. Learn the data engineering strategies needed to effectively manage, process, and derive insights from comprehensive datasets.. Creating robust, scalable, and fault-tolerant data pipelines is a complex task that requires multiple tools and techniques.

Unlock the skills of building real-time stock market data pipelines using Apache Kafka. Follow a detailed step-by-step guide from setting up Kafka on AWS EC2 and learn how to connect it to AWS Glue and Athena for intuitive data processing and insightful analytics.
Continue reading “End-to-End Data Pipeline for Real-Time Stock Market Data!”

Stream and Analyze PostgreSQL Data from S3 Using Kafka and ksqlDB: Part 2

Introduction

In Part 1, we set up a real-time data pipeline that streams PostgreSQL changes to Amazon S3 using Kafka Connect. Here’s what we accomplished:

  • Configured PostgreSQL for CDC (using logical decoding/WAL)
  • Deployed Kafka Connect with JDBC Source Connector (to capture PostgreSQL changes)
  • Set up an S3 Sink Connector (to persist data in S3 in Avro/Parquet format)

In Part 2 of our journey, we dive deeper into the process of streaming data from PostgreSQL to S3 via Kafka. This time, we explore how to set up connectors, create a sample PostgreSQL table with large datasets, and leverage ksqlDB for real-time data analysis. Additionally, we’ll cover the steps to configure AWS IAM policies for secure S3 access. Whether you’re building a data pipeline or experimenting with Kafka integrations, this guide will help you navigate the essentials with ease.

Continue reading “Stream and Analyze PostgreSQL Data from S3 Using Kafka and ksqlDB: Part 2”

Deploying a Production-Ready Kafka Cluster on Kubernetes with Strimzi

What is Event Streaming?

Event streams are similar to digital root systems in business, capturing real-time data from multiple sources for further processing and analysis.

Event Streaming is Used For?

Event streaming is used in a variety of industries for payments, logistics tracking, IoT data analytics, customer communication, healthcare analytics, data integration, and more Continue reading “Deploying a Production-Ready Kafka Cluster on Kubernetes with Strimzi”

Kafka within EFK Monitoring

Today’s world is entirely internet-driven, be it in any field, we can get any product of our choice with one click.

Talking about e-commerce more in DevOps terms, the entire application/website is based on microservice architecture i.e. distributing a bulk application into smaller services to increase scalability, manageability & more process driven.

Hence, to maintain smaller services one of the important aspects is to enable their Monitoring

One such commonly known stack is, EFK stack i.e. (Elasticsearch, Fluentd, Kibana) along with Kafka

Kafka is basically an open-source event streaming platform and is currently used by many companies. 

Question: Why use Kafka within EFK monitoring?  

Answer: Well this is the first question that strikes many minds hence, in this blog we’ll focus on why to use Kafka, what are its benefits and how to integrate it with the EFK stack. 

Interesting right? 🙂 let’s begin -:

Continue reading “Kafka within EFK Monitoring”

Kafka’s Solution : Event Driven Architecture: OTKafkaDiaries

Heroism often results as a response to extreme events.

James Geary

Event Driven Architecture:

Modern digital businesses work in real-time based events. Event-driven architecture is based on the design principle which follows loose-coupling and message-driven architecture. This Architecture helps to publish events/messages that applications and services can consume, and then perform an action based upon those events.

Where are we Today?

Back in the days when we started implementing microservices, were focused more on service decoupling, communication, and security which we were going to handle in such a system. 

Continue reading “Kafka’s Solution : Event Driven Architecture: OTKafkaDiaries”