Introduction
In Part 1, we set up a real-time data pipeline that streams PostgreSQL changes to Amazon S3 using Kafka Connect. Here’s what we accomplished:
- Configured PostgreSQL for CDC (using logical decoding/WAL)
- Deployed Kafka Connect with JDBC Source Connector (to capture PostgreSQL changes)
- Set up an S3 Sink Connector (to persist data in S3 in Avro/Parquet format)
In Part 2 of our journey, we dive deeper into the process of streaming data from PostgreSQL to S3 via Kafka. This time, we explore how to set up connectors, create a sample PostgreSQL table with large datasets, and leverage ksqlDB for real-time data analysis. Additionally, we’ll cover the steps to configure AWS IAM policies for secure S3 access. Whether you’re building a data pipeline or experimenting with Kafka integrations, this guide will help you navigate the essentials with ease.
Visual Data Flow Diagram

Step-by-Step Implementation
Let’s pick up where we left off in Part 1 and continue building our data pipeline by setting up connectors, creating sample data, and configuring S3 for seamless integration.
Step 1

Enter
Output

Step 2
docker exec –tty –interactive postgres bash -c ‘psql -U $POSTGRES_USER $POSTGRES_DB’


Step 3


Enter
Output


Step 4
docker exec -it ksqldb ksql http://ksqldb:8088
show connectors;

Step 5
kafkacat -b localhost:9092 -t postgres-employees -C -o beginning -f ‘%o %s\n’

Step-6
on aws
create a user through iam console
and add a policy into the same
by making a new policy


here you can manage the policy by specifying the s3 service required into your policy and bucket to which you want to apply this policy

Final Step



Blog Pundit: Sandeep Rawat Opstree is an End to End DevOps solution provider