Apache Hudi (Hadoop Upserts Deletes and Incrementals) is a powerful framework designed for managing large datasets on cloud storage systems, enabling efficient data ingestion, storage, and retrieval. One of the key features of Hudi is its support for two distinct storage types: Copy-On-Write (COW) and Merge-On-Read (MOR). Each of these storage strategies has unique characteristics and serves different use cases. In this blog, we will explore COW and MOR. Continue reading “Understanding COW and MOR in Apache Hudi: Choosing the Right Storage Strategy “
Tag: Hudi
Exploring Time Travel Queries in Apache Hudi
Apache Hudi (Hadoop Upserts Deletes and Incrementals) is an advanced data management framework designed to efficiently handle large-scale datasets. One of its standout features is time travel, which allows users to query historical versions of their data. This feature is essential for scenarios where you need to audit changes, recover from data issues, or simply analyze how data has evolved over time. In this blog post, we’ll walk through the process of setting up Hudi for time travel queries, using AWS Glue and PySpark for a hands-on example. Continue reading “Exploring Time Travel Queries in Apache Hudi”