Data Engineering Archives - Page 3 of 5

Exploring Time Travel Queries in Apache Hudi

Apache Hudi (Hadoop Upserts Deletes and Incrementals) is an advanced data management framework designed to efficiently handle large-scale datasets. One of its standout features is time travel, which allows users to query historical versions of their data. This feature is essential for scenarios where you need to audit changes, recover from data issues, or simply analyze how data has evolved over time. In this blog post, we’ll walk through the process of setting up Hudi for time travel queries, using AWS Glue and PySpark for a hands-on example. Continue reading “Exploring Time Travel Queries in Apache Hudi”

Getting Started with StreamLit: Build Interactive Data Apps in Python

In this blog, we will explore the Streamlit library, which simplifies the creation of data-driven web applications without having prior knowledge of front-end development.

INTRODUCTION

Streamlit is an open-source Python library that simplifies the creation of interactive web apps for data science and machine learning projects. It is highly user-friendly, with minimal coding required to turn Python scripts into shareable web apps. It allows developers and data scientists to create interactive, visually appealing applications with minimal effort by focusing on writing Python code rather than dealing with front-end development. Continue reading “Getting Started with StreamLit: Build Interactive Data Apps in Python”

Using Apache Flink for Real-time Stream Processing in Data Engineering

Businesses need to process data as it comes in, rather than waiting for it to be collected and analyzed later.

This is called real-time data processing, and it allows companies to make quick decisions based on the latest information.

Apache Flink is a powerful tool for achieving this. It specializes in stream processing, which means it can handle and analyze large amounts of data in real time. With Flink, engineers can build applications that process millions of events every second, allowing them to harness the full potential of their data quickly and efficiently.

Continue reading “Using Apache Flink for Real-time Stream Processing in Data Engineering”