Backpressure & Flow Control
The Perfect Storm
Your system is running smoothly. Producers are shipping messages at 1,000 events per second, consumers are processing them steadily, and the queue depth sits at a comfortable 50-100 messages. You’re getting alerts on your SLOs, and everything is green.
Then, 3 PM Friday hits. Your marketing team launches a campaign. Suddenly, traffic spikes 10x. Your message queue starts filling: 10,000 messages… 100,000… 1,000,000. Your consumers are still processing at their normal rate—they can’t magically work faster—but now they’re buried under an avalanche of work.
Memory usage on the message broker climbs steadily. Other services start experiencing slowdowns because broker resources are consumed by queue storage. Eventually, the broker hits its memory limit and starts rejecting new messages. Your producers get errors. Your system begins rejecting user requests. And that’s when the cascade begins: unprocessed orders, failed analytics events, angry customers at the queue waiting for their data.
This is what happens when you have no backpressure mechanism.
In Chapter 69, we talked about Dead Letter Queues as a safety net for failed messages. Backpressure is the prevention mechanism—it stops messages from piling up in the first place by telling upstream components: “Slow down, I can’t keep up.”
What Is Backpressure?
Backpressure is a mechanism for downstream components (consumers) to signal upstream components (producers) to reduce their rate of data flow. When a system can’t process data fast enough, it says, “Please wait” or “Please send me less.”
Without backpressure, you get unbounded queues—the silent killer of distributed systems. An unbounded queue with no limits will grow until it consumes all available memory. And then your system crashes.
Flow control is the broader concept: regulating the rate of data flow through a system to maintain stability. It includes backpressure, rate limiting, load shedding, and other mechanisms that keep your system from drowning in its own requests.
Think of it this way:
- Backpressure: “I’m getting overwhelmed; please slow down.”
- Flow control: The overall system of managing that speed adjustment.
- Load shedding: “I’m going to drop some low-priority work to save the critical path.”
A Traffic Analogy
Imagine a major highway during rush hour. Without any flow control, every car entering the highway at every on-ramp would create gridlock—nobody moves, not even the cars on the highway itself.
Smart highways use on-ramp meters: red lights that control how many cars enter the highway per minute. When traffic on the highway is heavy, the meters slow down—drivers wait at the on-ramp, but traffic on the highway keeps flowing. In extreme cases, highway patrol might close certain on-ramps entirely (load shedding for lower-priority routes).
Without meters:
- All cars rush the on-ramp → gridlock → nothing moves → complete failure
With meters:
- Cars queue at the on-ramp (producers wait) → highway traffic flows → system stays operational
This is exactly what backpressure does in your system.
The Core Strategies
Let’s look at the practical ways systems implement backpressure:
Blocking Producers
The simplest approach: when the queue is full, the producer simply blocks. It can’t add a new message until space becomes available.
Kafka producer example: The Kafka producer has a buffer.memory parameter (default: 32 MB). When the producer’s internal buffer is full, it waits up to max.block.ms (default: 60,000 ms) before throwing an exception.
producer.config:
buffer.memory: 33554432 # 32 MB
max.block.ms: 60000
batch.size: 16384
When the buffer fills, the producer thread blocks. This propagates the backpressure up the stack—if your web server is calling a producer synchronously, the web request slows down, and HTTP clients see delayed responses. They slow down too. Backpressure propagates upward.
Pros: Simple, prevents data loss. Cons: Latency increases for producers. If producers are user-facing (like web requests), users experience slowdowns.
Rate Limiting
Instead of blocking, you can rate-limit producers using algorithms like token bucket or leaky bucket. The producer can only send N messages per second, regardless of queue state.
from collections import deque
from time import time
class TokenBucket:
def __init__(self, capacity, refill_rate):
self.capacity = capacity
self.tokens = capacity
self.refill_rate = refill_rate # tokens per second
self.last_refill = time()
def consume(self, tokens=1):
now = time()
elapsed = now - self.last_refill
# Refill tokens based on elapsed time
self.tokens = min(
self.capacity,
self.tokens + elapsed * self.refill_rate
)
self.last_refill = now
if self.tokens >= tokens:
self.tokens -= tokens
return True
return False
# Allow 1,000 messages per second, max burst of 5,000
limiter = TokenBucket(capacity=5000, refill_rate=1000)
if limiter.consume(1):
producer.send(message)
else:
# Rate limit exceeded; drop, retry, or reject
handle_rate_limit()
Pros: Predictable, protects against burst traffic. Cons: May drop messages. Requires careful tuning of rate limits.
Credit-Based Flow Control
RabbitMQ uses a credit system: the consumer tells the broker, “I can handle N messages,” and the broker only sends N messages before waiting for the consumer to request more.
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
# Set prefetch count: consumer will only receive N messages at a time
channel.basic_qos(prefetch_count=10)
def callback(ch, method, properties, body):
print(f"Received: {body}")
# Do work...
ch.basic_ack(delivery_tag=method.delivery_tag)
channel.basic_consume(queue='my_queue', on_message_callback=callback)
channel.start_consuming()
When prefetch_count=10, RabbitMQ sends only 10 messages to this consumer. Once the consumer acknowledges 5 of them, the broker sends 5 more. This prevents overwhelming slow consumers.
Pros: Consumer controls flow. Broker doesn’t need to buffer excessively. Cons: Requires acknowledgment protocol. More moving parts.
Reactive Streams Backpressure
Libraries like Project Reactor and RxJava implement backpressure formally: the subscriber explicitly requests N items, and the publisher respects that request.
# Conceptual example in Python with asyncio
async def consume_with_backpressure():
producer = create_producer()
requested = 10 # Request 10 items
async for item in producer.request(requested):
await process(item)
if item_count % 5 == 0:
# Request more as we process
requested = 10
producer.request(requested)
The producer never sends more than the subscriber has requested. This is non-blocking backpressure—the producer doesn’t block, it just pauses sending.
Consumer-Side Flow Control
Sometimes, backpressure is handled entirely on the consumer side: the consumer pulls data at its own pace.
Kafka example: The consumer can control how many records it fetches:
from kafka import KafkaConsumer
consumer = KafkaConsumer(
'my_topic',
group_id='my_group',
max_poll_records=100, # Fetch max 100 records per poll
fetch_min_bytes=1024, # Wait for at least 1 KB
fetch_max_wait_ms=1000 # Or wait up to 1 second
)
for message in consumer:
process(message)
# If process() is slow, the consumer naturally slows down
# This creates backpressure on the broker
The consumer’s polling interval naturally creates backpressure. If consumers can’t keep up, the broker buffers messages until consumers request them.
Autoscaling Based on Queue Depth
The most sophisticated approach: scale the number of consumers dynamically based on queue depth.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-scaler
spec:
scaleTargetRef:
name: kafka-consumer-deployment
minReplicaCount: 1
maxReplicaCount: 100
triggers:
- type: kafka
metadata:
bootstrapServers: kafka:9092
consumerGroup: my_group
topic: my_topic
lagThreshold: "10000" # Scale up if lag exceeds 10k messages
This scales consumers automatically. When queue depth is high, KEDA spins up more consumer pods. When it drops, pods are removed. This applies backpressure indirectly: the system says, “I’ll get to you faster by throwing more workers at it.”
Load Shedding
Sometimes, you can’t process everything. In those cases, intentionally drop low-priority work:
def process_message(message):
if is_critical(message):
process_with_priority(message)
elif queue_depth > THRESHOLD:
# System overloaded; shed low-priority messages
log.warning(f"Shedding low-priority message: {message.id}")
return
else:
process_normal(message)
Or use a priority queue:
import heapq
class PriorityQueue:
def __init__(self):
self.queue = []
def put(self, priority, message):
heapq.heappush(self.queue, (priority, message))
def get(self):
if len(self.queue) > CRITICAL_THRESHOLD:
# When overloaded, process only high-priority (low numbers)
return heapq.heappop(self.queue)
return heapq.heappop(self.queue)
Load shedding trades correctness for availability—you’re saying, “I’ll lose some messages, but the system stays up.”
Monitoring Backpressure
You can’t fix what you don’t measure. Key metrics:
| Metric | What It Tells You | Alert Threshold |
|---|---|---|
| Queue depth | How many messages are waiting | Steady growth indicates backpressure |
| Consumer lag (Kafka) | How far behind the consumer is from the producer | lag over 100k messages = serious backpressure |
| Producer block time | Time producers spend blocked waiting for space | over 100 ms = system struggling |
| Message age | How long messages wait before processing | increasing age = backpressure building |
| Broker memory usage | How much of broker resources are consumed | over 80% = approaching saturation |
Set up alerts on these metrics:
alert_rules:
- name: QueueDepthExceeded
condition: queue_depth over 1000000 for 5 minutes
severity: critical
- name: ConsumerLagIncreasing
condition: consumer_lag > 100000 and increasing
severity: high
- name: ProducerBlockTime
condition: producer_block_time_ms > 100
severity: medium
A Cautionary Tale: The Thundering Herd
Here’s a real-world lesson: after a backpressure event is resolved, you can face a “thundering herd” problem.
Scenario: Your consumer had a deployment issue and was down for 2 minutes. During those 2 minutes, 2 million messages piled up in the queue. The consumer comes back online.
If backpressure is released all at once, the consumer suddenly receives millions of messages. It tries to process them all in parallel—database connection pools overflow, CPU spikes to 100%, memory explodes, and the consumer crashes again. Now it’s down for another 2 minutes, and the herd hits again.
Solution: Implement a slow drain. When recovering from backpressure, gradually ramp up consumption:
class GradualDrain:
def __init__(self, max_rate=1000):
self.max_rate = max_rate
self.current_rate = 10 # Start slow
def get_allowed_messages(self):
if self.current_rate < self.max_rate:
self.current_rate = min(
self.max_rate,
self.current_rate * 1.1 # 10% increase per interval
)
return int(self.current_rate)
This prevents the herd by ramping up consumption gradually.
Trade-offs and Decisions
When designing backpressure for your system, you face several tough choices:
Blocking vs. Dropping: Kafka producers block (no data loss, but high latency). Some systems drop messages (low latency, but data loss). Which is right? If you’re processing payments, you block. If you’re sampling user events, you drop.
Autoscaling Cost: Scaling consumers costs money. Kubernetes pods aren’t free. How aggressively should you scale? Scale too slowly, and you have latency. Scale too aggressively, and your cloud bill explodes.
End-to-End Backpressure: Ideally, backpressure propagates all the way upstream—from database saturation, through consumers, through brokers, to producers, to user-facing services. But this is complex. Partial backpressure (just at the broker) is simpler but less effective.
Synchronous Systems: HTTP has backpressure: HTTP 429 (Too Many Requests) with a Retry-After header. But it’s crude compared to async systems. If you receive 429, do you retry immediately? After 1 second? After an exponential backoff? The spec doesn’t say.
The Thundering Herd Visualized
Here’s what happens when backpressure isn’t managed carefully:
graph TD
A["Producer (10k msgs/sec)"] -->|massive burst| B["Queue<br/>2M messages"]
B -->|sudden release| C["Consumer<br/>spinning up"]
C -->|overload!| D["Consumer crashes"]
D -->|falls behind| B
B -->|backlog grows| E["System down"]
style B fill:#ff9999
style E fill:#ff6666
Architecture for Effective Backpressure
Here’s what a well-designed system looks like:
graph LR
A["Producers<br/>w/ rate limit"] -->|bounded buffer| B["Kafka<br/>broker"]
B -->|prefetch=10| C["Consumer pool<br/>Autoscaled"]
C -->|metrics| D["KEDA<br/>controller"]
D -->|scale up/down| C
E["Monitoring"] -.->|queue depth| D
E -.->|consumer lag| F["Alerts"]
F -.->|on-call page| G["Engineer"]
style D fill:#99ff99
style E fill:#99ccff
Key Takeaways
-
Backpressure prevents cascading failures: When producers overwhelm consumers, unbounded queues grow until memory is exhausted. Backpressure signals say, “Slow down,” preventing disaster.
-
Choose your strategy based on your data: Blocking and no data loss? Kafka producer. Lose some events to stay up? Rate limiting and load shedding. Need real-time responsiveness? Reactive streams.
-
Monitor the invisible metric: Queue depth and consumer lag are your canaries in the coal mine. By the time your system is actually failing, these metrics were screaming for hours.
-
Autoscaling is powerful but expensive: KEDA-style autoscaling fixes many backpressure problems without code changes. But scaling takes time, and it costs money. Start simple, scale when you measure that it’s needed.
-
The thundering herd is real: Backpressure release can be as dangerous as backpressure buildup. Plan for gradual recovery, not sudden releases.
-
End-to-end backpressure is ideal but complex: The dream is backpressure flowing from the database all the way to user-facing services. In practice, you apply backpressure at the broker and hope the architecture is resilient enough to handle temporary overload.
Practice Scenarios
Scenario 1: The Midnight Traffic Spike
Your ad-serving system processes ad impressions (an event every time an ad is shown). The queue processes 100k messages/second normally. At midnight, your iOS app releases a new feature, and traffic spikes to 500k messages/second.
Your current setup: Kafka with a single consumer group, no autoscaling, no rate limiting. Within 5 minutes, your Kafka broker is at 95% memory usage. You need to design a backpressure solution that:
- Keeps the broker from running out of memory
- Minimizes message loss (these are important for revenue tracking)
- Can be deployed without code changes to producers
What’s your approach? (Consider: autoscaling, rate limiting, load shedding, priority queues.)
Scenario 2: The Consumer That Crashed
Your order processing system has a bug. A consumer processes orders from an SQS queue, but under certain conditions, it crashes after 10 minutes. The queue starts backing up. After 30 minutes, 500k orders are waiting.
You fix the bug and redeploy the consumer. It comes back online and immediately starts consuming: 1,000 orders/second. Your database (PostgreSQL) can only handle 5,000 writes/second before response time drops from 10 ms to 100 ms. With 500k orders backing up, the consumer is hammering the database and making things worse.
How do you recover without crashing the database?
What’s Next
Understanding backpressure and flow control in message queues is foundational for what comes next. In Chapter 15, we dive into event-driven architecture—building systems where services communicate entirely through events. When you have multiple services passing messages to each other, backpressure becomes even more critical. You’ll see how the concepts we’ve covered here—bounded queues, rate limiting, autoscaling—scale to distributed systems with dozens of services and hundreds of events per second.
We’ll also explore stream processing: handling continuous flows of data (like real-time analytics on clickstreams). Stream processing is where backpressure matters the most—you’re processing infinite data, so without good flow control, you’ll be forever behind.