Unlocking Debezium: Exploring the Fundamentals of Real-Time Change Data Capture with Debezium and Harnessing its Power in Docker Containers

Introduction

In a fast moving data driven environment, applications are expected to respond instantly to changes happening inside databases. Batch based systems struggle to meet this demand, especially when businesses rely on real time dashboards, alerts, and event driven workflows. This is where change data capture becomes an essential architectural component. Debezium provides a reliable way to stream database changes in real time and integrates seamlessly with Apache Kafka.

This article walks through the fundamentals of Debezium and demonstrates how PostgreSQL changes can be streamed to Kafka using Docker. The focus is on practical understanding with a working setup rather than just theory.

Understanding Change Data Capture

Change Data Capture, often referred to as CDC, is a mechanism that tracks inserts, updates, and deletes occurring in a database as they happen. Instead of repeatedly querying tables or running heavy batch jobs, CDC captures only the data that has changed.

This approach allows downstream systems to consume fresh data with minimal delay while keeping database load low. CDC is widely used in analytics platforms, event driven microservices, and data replication pipelines.

What Debezium Brings to the Table

Debezium is an open source CDC platform developed by Red Hat. It works by reading database transaction logs, which already record every data modification. By leveraging these logs, Debezium captures changes efficiently and reliably.

Debezium supports multiple databases such as PostgreSQL, MySQL, SQL Server, Oracle, and MongoDB. It publishes each change as a structured event into Kafka topics, making the data available for real time processing.

How Debezium Works Behind the Scenes

Debezium uses a log based CDC approach. Instead of polling database tables, it connects directly to the database log. Every insert, update, or delete operation is converted into a change event.

Each database has its own Debezium connector that understands how to read its transaction log. These connectors push standardized events to Kafka. Kafka then acts as a durable and scalable streaming backbone.

Each event includes details such as database name, table name, primary key, before and after values, and timestamps. This rich metadata makes the events suitable for analytics, auditing, and synchronization.

Use Cases for Debezium:

  1. Microservices Architecture: Debezium plays a crucial role in event-driven microservices architectures, where each microservice can react to specific changes in the data. By consuming the change events, services can update their local view of data or trigger further actions

  1. Data Synchronization: Debezium can be used to keep multiple databases in sync by replicating changes from one database to another in real-time. This is especially useful in scenarios where data needs to be replicated across geographically distributed systems or in cases where different databases serve specific purposes within an organization.

  1. Stream Processing and Analytics: Debezium’s real-time change data capture capabilities make it an excellent choice for streaming data processing and analytics. By consuming the change events from Debezium, organizations can perform real-time analysis, monitoring, and aggregations on the data. This can be particularly beneficial for applications such as fraud detection, real-time dashboards, and personalized recommendations.

  1. Data Warehousing and ETL (Extract, Transform, Load): Debezium can play a vital role in populating data warehouses or data lakes by capturing and transforming the change events into the desired format. It eliminates the need for batch processing or periodic data extraction, enabling near real-time data updates in analytical systems.

 

  1. Data Integration and Replication: Debezium simplifies data integration by providing a reliable and efficient way to replicate data changes across different systems. It allows organizations to easily integrate and synchronize data between legacy systems, modern applications, and cloud-based services. This is particularly valuable in scenarios involving hybrid cloud architectures or when migrating from one database platform to another.

  1. Audit Trail and Compliance: Debezium’s ability to capture every data manipulation operation in a database’s log makes it an ideal solution for generating an audit trail. Organizations can use Debezium to track and record all changes made to critical data, ensuring compliance with regulations and providing a reliable historical record of data modifications.

Setting Up PostgreSQL, Kafka, and Debezium Using Docker

To simplify the setup, Docker and Docker Compose are used. This allows all required services to run together without manual installation.

Before starting, make sure Docker and Docker Compose are available on your system.

Clone the repository that contains the Docker Compose configuration for PostgreSQL, Kafka, ZooKeeper, and Debezium.

Use this repository link in your browser or Git client.

 
https://github.com/sunil9837/Debezium-Setup.git

After cloning the repository, navigate into the project directory.

 
cd Debezium-Setup

Bring up all required containers in detached mode using Docker Compose.

 
docker-compose up -d

Once the containers are running, PostgreSQL, Kafka, ZooKeeper, and Kafka Connect will be available inside the Docker network.

Creating a Test Table in PostgreSQL

To validate streaming, a simple table is created in PostgreSQL.

First, access the PostgreSQL container shell.

 
docker exec -it ubuntu_db_1 bash

Log in to the PostgreSQL database.

 
psql -U postgres -d postgres

Create a table for testing.

 
CREATE TABLE transaction ( name VARCHAR(100), age INTEGER );

This table will be monitored by Debezium for real time changes.

Activating the Debezium Connector

Debezium connectors are created by sending a configuration request to Kafka Connect. The configuration is stored in a JSON file inside the repository.

The request is sent to the Kafka Connect REST endpoint using an HTTP client command.

 
curl -i -X POST \ -H "Accept application/json" \ -H "Content-Type application/json" \ http://localhost:8083/connectors/ \ --data "@debezium.json"

If the configuration is correct, Kafka Connect responds with a success message confirming that the connector has been registered. From this point onward, Debezium starts reading changes from PostgreSQL.

Verifying Kafka Topics

Kafka automatically creates a topic for each monitored table. To verify this, list all topics in the Kafka cluster.

 
docker exec -it \ $(docker ps | grep ubuntu_kafka_1 | awk '{print $1}') \ /kafka/bin/kafka-topics.sh \ --bootstrap-server localhost:9092 --list

You should see a topic corresponding to the PostgreSQL table created earlier.

Monitoring Real Time Events Using Kafka Consumer

Kafka provides a console consumer utility that allows you to read messages from a topic in real time. This helps verify whether change events are flowing correctly.

Start the Kafka console consumer for the table topic.

 
docker exec -it \ $(docker ps | grep ubuntu_kafka_1 | awk '{print $1}') \ /kafka/bin/kafka-console-consumer.sh \ --bootstrap-server localhost:9092 \ --topic emp.public.transaction

If you want to read all events from the beginning, you can include the from beginning option.

Testing the End to End Streaming

Insert a record into the PostgreSQL table.

 
INSERT INTO transaction (name, age) VALUES ('Opstree', 30);

As soon as the record is inserted, a new event appears in the Kafka console consumer. The message includes the primary key, column values, and metadata such as timestamps.

This confirms that PostgreSQL changes are successfully streaming to Kafka through Debezium.

Conclusion

Debezium provides a robust and production ready solution for implementing change data capture. By reading database transaction logs and streaming events through Apache Kafka, it enables real time data pipelines with minimal latency.

This approach is well suited for microservices communication, analytics platforms, data synchronization, and compliance auditing. As organizations continue to adopt event driven architectures, Debezium remains a key building block for real time systems.

Reference:

https://debezium.io/documentation/reference/stable/tutorial.html

https://debezium.io/documentation/reference/stable/architecture.html

https://www.infoq.com/presentations/data-streaming-kafka-debezium/

Blog Pundits: Deepak Gupta, Naveen Verma and Sandeep Rawat

OpsTree is an End-to-End DevOps Solution Provider.

Connect with Us

PRTG Network Monitor: Why and How?

In today’s highly interconnected and technology-driven world, maintaining a stable and efficient network infrastructure is crucial for businesses of all sizes. Network & infra-monitoring tools play a vital role in ensuring the smooth operation of infrastructure by providing real-time visibility into system and network performance, identifying bottlenecks, and enabling proactive troubleshooting.

PRTG Network Monitor stands out as a powerful and easy-to-use tool that simplifies the monitoring process and empowers administrators to keep their environment running smoothly. In this blog post, we will explore the features and benefits of PRTG Network Monitor.

What is PRTG?

PRTG Network Monitor, developed by Paessler AG, is a robust and user-friendly network monitoring tool that offers a wide range of features to effectively monitor network devices, servers, and applications. With its centralized monitoring system, PRTG provides administrators with a holistic view of their network infrastructure, allowing them to detect and resolve issues before they impact users or cause downtime.

Why PRTG Stands Out in the Competition?

Continue reading “PRTG Network Monitor: Why and How?”

How ‘DevOps as a Service’ is Transforming Software Deliveries?

Organizations, these days, are constantly striving to deliver high-quality software products and services at an accelerated pace. To achieve this goal, many enterprises are turning to DevOps.  However, adopting and implementing DevOps solutions can be a daunting task. This requires significant investments in infrastructure, tools and experts.

Enter DevOps as a Service (DaaS), a game-changing solution that revolutionizes the way organizations embrace and leverage DevOps methodologies. DaaS provides a managed platform and toolset that allows businesses to focus on their core competencies. While doing so, it enables them to harness the power of DevOps without the burden of building and maintaining their own infrastructure.

In this blog, we’ll delve into the world of DevOps as a Service and explore how it can unlock efficiency, agility and innovation within organizations. We’ll uncover the benefits, strategies and considerations associated with adopting DaaS. As we explore the world of DevOps as a Service, discover how it can revolutionize your organization’s software delivery and drive its success in the digital age.DevOps as a Service

What is DevOps as a Service (DaaS)?

DevOps as a Service (DaaS) is a cloud-based service model that provides organizations with a managed platform and toolset to support their DevOps initiatives. It aims to simplify the adoption of DevOps practices and hybrid cloud implementation by offering ready-to-use infrastructure and automation capabilities.

Continue reading “How ‘DevOps as a Service’ is Transforming Software Deliveries?”

CICD for Mobile App Development Using Capacitor JS on Azure DevOps

In the world of iOS mobile app development, implementing a robust CI/CD (Continuous Integration/Continuous Delivery) pipeline is essential to ensure efficient and reliable software delivery. Capacitor JS is a powerful framework that allows developers to build cross-platform mobile apps using web technologies. When combined with Azure DevOps, it enables a seamless CI/CD pipeline for iOS app development. In this blog post, we will guide you through the process of setting up a CI/CD pipeline for iOS mobile apps using Capacitor JS and Azure DevOps.

A Note on Capacitor JS

Capacitor is a free and open source (MIT-licensed) platform that enables web developers to build cross-platform apps with standard web technology that runs in modern browsers. Capacitor takes your existing web application and runs it as a native app on each platform, providing hooks into the native platform via JavaScript. These hooks can be built directly into the app, or as standalone plugins to be reused and distributed to others.

Can I reuse existing web code and share new code with a web app?
Yes! One of the strengths of Capacitor is that it runs normal web apps natively. In many cases, teams have a single codebase for web and mobile using Capacitor.

Prerequisite –

1. An azure devops account.
2. Working Web app code
3. App center account for distribution

Continue reading “CICD for Mobile App Development Using Capacitor JS on Azure DevOps”

The Great Platform Engineering Surge

In today’s digital age, platforms have become the backbone of numerous industries, transforming the way we communicate, collaborate and conduct business. As technology continues to advance at an unprecedented pace, platform engineering has emerged as a crucial discipline, driving innovation, scalability and seamless user experiences.

From cloud computing providers to streaming services, software engineering platforms are empowering organizations to harness the immense potential of technology. Explore the key drivers & benefits of platform engineering and the ways in which it is reshaping industries and redefining business models. So, let’s get started.

What is Platform Engineering?

Platform engineering refers to the discipline of designing, developing and maintaining the underlying infrastructure, frameworks and tools. The main aim is to enable the smooth operation and delivery of software applications or services. It focuses on creating robust and scalable software engineering platforms that support the development, deployment and management of software systems.

Continue reading “The Great Platform Engineering Surge”