Azure Event Hubs is designed for massively scalable data ingestion, and the secret behind its performance is partitions.
If you understand partitions, you understand how to design high throughput pipelines correctly.
Table of Contents
What Are Event Hub Partitions?
A partition is essentially an ordered log (like a lane on a highway).
When you create an Event Hub, you choose how many lanes (partitions) you want.
Each partition stores events in the exact order they arrive, and consumers read data partition by partition.
Analogy:
Think of a 4-lane highway.
More lanes → more cars can flow in parallel → higher throughput.
Why Do Partitions Exist?
1. Scaling reads
Each consumer reads from one or more partitions.
More partitions = more consumers can process in parallel.
2. Scaling writes
Producers distribute their outgoing events across partitions.
More partitions = more parallel writes = higher ingestion rates.
3. Ordering guarantee
Within a single partition, event order is preserved.
Between partitions, no ordering is guaranteed.
Unlock AI-ready insights with enterprise-grade data engineering solutions
How Events Get Distributed Across Partitions
When a producer sends an event, Event Hubs decides which partition to place it in.
There are two ways:
Option 1: Automatic Round-Robin (Default)
If you don’t specify anything, Event Hubs assigns events round-robin:
Event 1 → Partition 1 Event 2 → Partition 2 Event 3 → Partition 3 Event 4 → Partition 1 ...
Great for random or independent events.
Option 2: Using a Partition Key
If related events must stay in order, you use a partition key:
partition_key = "patient123"
Event Hubs ensures all events with the same key go to the same partition.
Examples:
- All events from the same device
- All events from the same user session
- All events from a specific patient record
This is the only way to guarantee ordering for related events.
How Consumers Read from Partitions
Consumers never read the entire Event Hub at once.
They attach to specific partitions.
If you have 4 partitions and 2 consumer instances:
Consumer 1 → Partition 0, Partition 1 Consumer 2 → Partition 2, Partition 3
If you increase consumers to 4:
Consumer 1 → Partition 0 Consumer 2 → Partition 1 Consumer 3 → Partition 2 Consumer 4 → Partition 3
Partitions can only be owned by one consumer instance within a consumer group.
That means:
- You cannot have more consumers than partitions in a consumer group.
- Extra consumers will sit idle.
Case Study : Nearly $40K AWS Cloud Cost Reduction in Just 6 Weeks
Retention and Checkpoints
Each partition has:
- Offset (location of event)
- Sequence number
- Timestamp
Consumers maintain checkpoints, which tell Event Hubs:
“I have processed events up to this point.”
Checkpoints allow consumers to:
- Resume from the exact place after restart
- Avoid reprocessing old events
How Many Partitions Should You Use?
This is one of the most common design questions.
General Recommendations:
- 4–8 partitions for small workloads
- 8–32 partitions for high-throughput workloads
- More if your throughput requirements are unpredictable
Important:
You cannot decrease partitions later.
You can only increase partitions (starting 2024+), but it causes redistribution implications.
So pick a number slightly above what you need.
When Partition Count Matters Most
High message throughput
More partitions = more ingestion lanes.
Parallel processing
If you want 10 consumer instances, you need at least 10 partitions.
Ordering requirements
If you need ordering per device/patient/order → use partition key → 1 device = 1 partition assignment.
Common Mistakes to Avoid
Choosing too few partitions
Leads to ingestion bottlenecks and consumer lag.
Using wrong partition key
For example:
timestamp→ spreads unevenlydevice_type→ only a few partitions get overloaded
More consumers than partitions
Extra consumers do nothing.
Expecting global ordering
Event Hubs only guarantees ordering inside a single partition.
Summary
| Concept | Explanation |
|---|---|
| Partition | Ordered log that stores events |
| Purpose | Enable parallel reads/writes at high scale |
| Ordering | Guaranteed only within a partition |
| Partition Key | Ensures related events go to same partition |
| Consumers | One consumer per partition per consumer group |
| Scaling | More partitions = more throughput & parallelism |
See Additional Guides on Cloud Topics
- Building a Reliable Cloud Data Storage Architecture for Big Data.
- The Ultimate Guide to Cloud Data Engineering with Azure, ADF, and Databricks.
- Top devsecops consulting services company in india.
- Top Data Engineering Companies in India 2026