System Design Fundamentals

Idempotency in Distributed Systems

A

Idempotency in Distributed Systems

The Checkout Nightmare

Picture this: Your customer is on the checkout page of your e-commerce platform. They click “Pay Now.” The request travels across the network, hits your payment service, and… nothing. No response. The network timeout fires. Did the payment go through? The customer doesn’t know. In frustration (or just habit), they click “Pay Now” again.

Two requests. One intent. But the customer’s bank account now shows two charges of $99.99. They’re furious. You’re facing a support ticket and a potential refund. This scenario plays out thousands of times daily across the internet—and it’s preventable.

Now consider the backend angle: Your microservice publishes a message to a queue to trigger an order confirmation email. The consumer processes the message successfully but crashes before acknowledging it. The message broker re-delivers the message. Your confirmation email is sent twice. Your customer receives two identical emails and questions whether something is wrong with your system.

These are not hypothetical problems. They’re the reason Stripe, PayPal, Amazon, and Google obsess over idempotency. When your system handles distributed operations—where networks fail, services crash, and retries are necessary—idempotency is the pattern that separates chaos from reliability.

Idempotency is your insurance policy against the cascading failures of retries and the duplicate processing that haunts distributed systems.

What Is Idempotency?

An operation is idempotent if performing it multiple times has the same effect as performing it once. Mathematically: f(f(x)) = f(x). Apply the function once, twice, a hundred times—the outcome is identical.

Consider these operations:

Naturally Idempotent:

  • GET /users/123 — Fetching a user by ID returns the same user data whether you fetch once or a thousand times.
  • SET balance = 500 — Setting an absolute value is idempotent. The balance is 500 after one execution or a thousand.
  • DELETE /orders/456 — Deleting by ID is idempotent (after the first deletion, subsequent deletes find nothing to delete).

Naturally NON-Idempotent:

  • POST /transfers — Creating a new transfer. Each POST creates a new transfer. Two identical POSTs create two transfers.
  • balance += 50 — Incrementing a balance. Each increment adds 50. Two increments add 100.
  • append("log message") — Appending to a log. Two appends create two entries.

The challenge in distributed systems is that most business operations are non-idempotent by nature. You can’t simply retry a POST without risking duplicates. You need a pattern.

Enter the idempotency key pattern.

The Idempotency Key Pattern

The idempotency key pattern is deceptively simple but extraordinarily powerful:

  1. Client generates a unique identifier (typically a UUID) for each logical operation.
  2. Client includes this key in the request (as a header, query parameter, or request body).
  3. Server stores the key along with the operation result.
  4. On retry, client sends the same key.
  5. Server recognizes the key and returns the cached result without re-processing.

This transforms at-least-once delivery (typical of network retries) into effectively exactly-once processing.

Example flow:

Request 1: POST /transfers, Idempotency-Key: abc-123-def
Server: Process transfer, store (abc-123-def -> transfer_id_456)
Response: transfer_id: 456

Network timeout. Client retries.

Request 2: POST /transfers, Idempotency-Key: abc-123-def
Server: Look up key abc-123-def, find transfer_id_456
Response: transfer_id: 456 (cached response, no new transfer created)

Result: One transfer, two requests. Idempotent.

Idempotency keys are not about preventing duplicate network requests (retries will still happen). They’re about preventing duplicate processing. The network might deliver your request twice, but your system processes the intent once.

The Elevator vs. Vending Machine Analogy

Think of an elevator in a tall building. You want to go to Floor 5. You press the button once. The elevator proceeds to Floor 5. Now press the button ten times in succession. What happens? The elevator goes to Floor 5 (maybe with some blinking lights). It doesn’t stop at Floor 5 ten times or make multiple trips. Pressing the button is idempotent.

Compare this to a vending machine: You press the “Dispense Snack” button. One snack drops. Press it again—another snack drops (and you’re $2.50 poorer). The button is not idempotent; each press is a distinct action.

To make the vending machine idempotent, imagine the manufacturer adds a ticket system: When you make a purchase, you receive a ticket. If you present the same ticket to the machine again, it recognizes you’ve already paid for that snack and doesn’t dispense again. This is idempotency through deduplication—the core principle behind idempotency keys.

Implementation: The Complete Server-Side Flow

Let’s implement idempotency from the ground up. Here’s the mental model:

graph TD
    A[Client sends request with Idempotency-Key] --> B{Key exists in store?}
    B -->|Yes| C[Return cached response]
    B -->|No| D[Acquire lock on key]
    D --> E[Check key again double-check pattern]
    E -->|Exists now| C
    E -->|Still doesn't exist| F[Process operation]
    F --> G[Store result with key]
    G --> H[Release lock]
    H --> I[Return response]
    C --> J[Done]
    I --> J

Here’s a concrete implementation in Python using Redis for the deduplication store:

import redis
import json
import uuid
from fastapi import FastAPI, Header, HTTPException
from pydantic import BaseModel

app = FastAPI()
redis_client = redis.Redis(host='localhost', port=6379, db=0, decode_responses=True)

class TransferRequest(BaseModel):
    from_account: str
    to_account: str
    amount: float

class TransferResponse(BaseModel):
    transfer_id: str
    status: str
    amount: float

def process_transfer(req: TransferRequest) -> TransferResponse:
    """Simulate actual transfer processing"""
    transfer_id = str(uuid.uuid4())
    # In reality: update database, call payment processor, etc.
    return TransferResponse(
        transfer_id=transfer_id,
        status="completed",
        amount=req.amount
    )

@app.post("/transfers")
async def create_transfer(
    request: TransferRequest,
    idempotency_key: str = Header(None)
) -> TransferResponse:
    if not idempotency_key:
        raise HTTPException(status_code=400, detail="Idempotency-Key header required")

    # Key for storing the result
    result_key = f"transfer:{idempotency_key}"
    lock_key = f"transfer:{idempotency_key}:lock"

    # Check if we've already processed this key
    cached_result = redis_client.get(result_key)
    if cached_result:
        return TransferResponse(**json.loads(cached_result))

    # Acquire a lock to prevent race conditions
    lock_acquired = redis_client.set(lock_key, "1", nx=True, ex=10)
    if not lock_acquired:
        # Another request is processing this key; wait and check again
        import time
        time.sleep(0.5)
        cached_result = redis_client.get(result_key)
        if cached_result:
            return TransferResponse(**json.loads(cached_result))
        raise HTTPException(status_code=409, detail="Request in progress")

    try:
        # Process the transfer
        result = process_transfer(request)

        # Store the result with TTL (24 hours)
        redis_client.setex(
            result_key,
            86400,  # 24 hours in seconds
            json.dumps(result.dict())
        )

        return result
    finally:
        redis_client.delete(lock_key)

This implementation handles:

  • Deduplication: Checking if the key exists before processing
  • Race conditions: Using a lock to serialize concurrent identical requests
  • Result caching: Storing the result with a TTL
  • Cleanup: Automatic expiration to bound storage

Idempotency Across Technologies

Different systems implement idempotency differently. Let’s see how:

Stripe’s Payment API

Stripe pioneered the idempotency key pattern in payment processing. Developers include an Idempotency-Key header:

curl https://api.stripe.com/v1/payment_intents \
  -H "Idempotency-Key: abc123" \
  -d amount=2000 \
  -d currency=usd

Stripe guarantees that identical requests (same idempotency key) return identical responses within a 24-hour window.

Apache Kafka’s Idempotent Producer

Kafka’s idempotent producer configuration prevents duplicate messages from reaching brokers:

# Producer configuration
acks=all
retries=Integer.MAX_VALUE
max.in.flight.requests.per.connection=5
enable.idempotence=true

With enable.idempotence=true, the Kafka producer adds a producer ID and sequence number to each message. The broker deduplicates messages with the same producer ID and sequence number within a window, achieving exactly-once semantics.

Database-Level Idempotency: UPSERT

PostgreSQL supports idempotent writes through INSERT ... ON CONFLICT:

INSERT INTO transfers (transfer_id, from_account, to_account, amount, status)
VALUES (uuid_generate_v4(), 'account_1', 'account_2', 100.00, 'completed')
ON CONFLICT (transfer_id) DO NOTHING;

The ON CONFLICT clause makes the insert idempotent—if the transfer_id already exists, nothing happens.

Or use conditional updates:

UPDATE accounts
SET balance = balance + 50, version = version + 1
WHERE account_id = 'account_1' AND version = 10;

The WHERE version = 10 ensures the update only succeeds if the version hasn’t changed, preventing double-increments due to retries.

RabbitMQ with Message Deduplication

RabbitMQ consumers handle idempotency through message tracking:

import pika
import json
from datetime import datetime, timedelta

connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()

# Use a deduplication table
processed_messages = {}  # In production: use Redis or database

def callback(ch, method, properties, body):
    message = json.loads(body)
    message_id = properties.message_id

    # Check if we've processed this message
    if message_id in processed_messages:
        print(f"Duplicate message {message_id}, skipping")
        ch.basic_ack(delivery_tag=method.delivery_tag)
        return

    try:
        # Process the message
        print(f"Processing message: {message}")
        # Your business logic here

        # Mark as processed
        processed_messages[message_id] = datetime.now()

        # Acknowledge only after successful processing
        ch.basic_ack(delivery_tag=method.delivery_tag)
    except Exception as e:
        # On error, nack to trigger redelivery
        ch.basic_nack(delivery_tag=method.delivery_tag, requeue=True)

channel.basic_consume(queue='orders', on_message_callback=callback)
channel.start_consuming()

Storage Trade-offs

Where should you store idempotency keys? Each option has trade-offs:

StorageProsCons
RedisFast lookups, automatic TTL, distributedRequires operational overhead, data lost on failure
DatabaseDurable, queryable, survives restartsSlower than Redis, requires cleanup jobs
In-memory cache (single server)Simplest to implementNot distributed, lost on crash
Distributed cache (e.g., memcached)Speed with distributionEventual consistency challenges

Pro tip: Use a hybrid approach. For high-volume APIs (payment processing), use Redis with database backup. For lower-volume endpoints, a database table with periodic cleanup is sufficient.

Race Conditions and Locking

The naive idempotency implementation has a race condition window:

Request A: Check key (not found)
Request B: Check key (not found) [happens before A stores result]
Request A: Process and store result
Request B: Process and store result

Result: Two identical requests both process!

Prevent this with pessimistic locking (acquire a lock before checking):

# Pessimistic lock using Redis SET NX
lock_acquired = redis_client.set(lock_key, "1", nx=True, ex=10)

if not lock_acquired:
    # Wait for the first request to finish
    while redis_client.exists(lock_key):
        time.sleep(0.1)
    # Now the result should be cached
    cached_result = redis_client.get(result_key)
    return cached_result

Or use optimistic locking with version numbers:

UPDATE idempotency_keys
SET result = ?, processed_at = ?, version = version + 1
WHERE key = ? AND version = ? AND processed_at IS NULL;

Only succeed if the version matches, preventing concurrent processing.

Key Expiration: The Storage vs. Safety Trade-off

How long should you keep idempotency keys? Too short and you can’t handle retries beyond the window. Too long and you waste storage.

  • Payment APIs: 24 hours (Stripe, PayPal)
  • Transactional operations: 1 hour
  • High-frequency APIs: 15 minutes
  • Short-lived operations: 5 minutes

The key insight: The expiration window must be longer than your worst-case retry timeout plus the time a client might retry.

# TTL calculation
max_retry_timeout = 60  # 60 seconds
max_client_retry_delay = 3600  # 1 hour
safety_margin = 300  # 5 minutes

ttl = max_retry_timeout + max_client_retry_delay + safety_margin
# Result: 3960 seconds (~1 hour)

Idempotency in Message Queues: Exactly-Once Delivery

Message queues typically offer “at-least-once” delivery guarantees. To achieve “exactly-once,” combine queue idempotency with application-level deduplication:

graph LR
    A[Producer] -->|Message + unique ID| B[Queue]
    B -->|At-least-once| C{Dedup<br/>table}
    C -->|New| D[Process]
    C -->|Seen| E[Skip]
    D --> F[Store in dedup table]
    F --> G[ACK]
    E --> G

Kafka transactional API example:

from kafka import KafkaProducer, KafkaConsumer
import json

producer = KafkaProducer(
    bootstrap_servers=['localhost:9092'],
    acks='all',
    retries=3,
    enable_idempotence=True  # Enable idempotent producer
)

# Producer side: Idempotent by default
for i in range(100):
    producer.send('orders', {'order_id': i, 'amount': 100})

consumer = KafkaConsumer(
    'orders',
    bootstrap_servers=['localhost:9092'],
    group_id='order_processor',
    enable_auto_commit=False,
    isolation_level='read_committed'  # Only read committed messages
)

processed_ids = set()

for message in consumer:
    order = json.loads(message.value)

    if order['order_id'] in processed_ids:
        # Already processed
        consumer.commit()
        continue

    # Process order
    print(f"Processing order {order['order_id']}")
    processed_ids.add(order['order_id'])

    # Commit only after processing
    consumer.commit()

Testing Idempotency

Idempotency must be tested, not assumed. Here’s a chaos-testing approach:

import pytest
import random
from unittest.mock import patch

@pytest.mark.asyncio
async def test_idempotent_transfer():
    """Verify that duplicate requests result in one transfer"""
    idempotency_key = "test-key-123"
    transfer_request = TransferRequest(
        from_account="A",
        to_account="B",
        amount=100.0
    )

    # Send the same request multiple times
    responses = []
    for _ in range(5):
        response = await create_transfer(transfer_request, idempotency_key)
        responses.append(response)

    # All responses should be identical
    assert all(r.transfer_id == responses[0].transfer_id for r in responses)

    # Only one transfer should exist in the database
    transfer_count = await count_transfers_by_key(idempotency_key)
    assert transfer_count == 1

@pytest.mark.asyncio
async def test_concurrent_identical_requests():
    """Test race condition: concurrent identical requests"""
    import asyncio

    idempotency_key = "concurrent-test"
    transfer_request = TransferRequest(
        from_account="A",
        to_account="B",
        amount=50.0
    )

    # Fire concurrent requests
    tasks = [
        create_transfer(transfer_request, idempotency_key)
        for _ in range(10)
    ]
    responses = await asyncio.gather(*tasks)

    # All should have the same transfer_id
    assert len(set(r.transfer_id for r in responses)) == 1

When NOT to Use Idempotency Keys

Idempotency adds complexity and storage overhead. It’s not always necessary:

  • Read-only endpoints (GET requests are already idempotent by design)
  • Trusted internal services communicating over reliable channels
  • Endpoints where duplicate requests are harmless (e.g., analytics events)
  • Streaming operations where idempotency doesn’t make sense

Evaluate the cost-benefit for each endpoint.

Key Takeaways

  • Idempotency ensures exactly-once semantics at the application level, even when the underlying transport (network, queue) only guarantees at-least-once delivery.

  • The idempotency key pattern (client generates unique key, server deduplicates) is the standard approach used by payment processors and enterprise APIs.

  • Storage matters: Redis for performance, databases for durability; choose based on your traffic and failure tolerance.

  • Race conditions are real: Use locking (pessimistic or optimistic) to prevent concurrent requests from both processing.

  • Key expiration is a trade-off: Longer windows catch more retries but consume more storage; shorter windows reduce cost but risk duplicate processing if retries arrive after expiration.

  • Different systems implement idempotency differently: Databases use UPSERT and conditional updates; message queues use sequence numbers and deduplication tables; APIs use the idempotency key pattern.

Practice Scenarios

Scenario 1: E-commerce Refund System

You’re building a refund system for an e-commerce platform. A customer initiates a refund through your mobile app. The app makes an API call to your backend but loses connection before receiving a response. The customer’s retry logic kicks in and sends the same refund request again. Your system processes both requests.

Design an idempotent refund endpoint:

  • Where will you store idempotency keys?
  • How will you handle concurrent refund requests for the same order (same idempotency key)?
  • What should the refund endpoint return to ensure the mobile app can reliably detect duplicates?
  • How long should you keep the idempotency key?

Scenario 2: Event-Driven Inventory System

Your warehouse publishes “item_reserved” events to a message queue. An order service consumes these events and decrements inventory. During a surge, the same message is delivered twice (broker redelivery). The inventory is decremented twice, causing inventory inconsistency.

Design an idempotent consumer:

  • How will you track processed messages?
  • Should you use message IDs provided by the broker or generate your own?
  • What happens if a message arrives after the deduplication window?
  • How do you prevent the deduplication table from growing indefinitely?

Scenario 3: Multi-Service Payment Flow

A payment request flows through three services: (1) payment-service processes the charge, (2) order-service creates the order record, (3) notification-service sends a confirmation email. If service (2) fails after (1) succeeds, the retry will re-process (1), charging the customer twice.

Design an idempotent payment flow across three services:

  • How do you propagate idempotency keys across service boundaries?
  • Should each service generate its own idempotency key or use the client’s?
  • How do you handle partial failures (one service processes, another doesn’t)?

What’s Next: Distributed Transactions and Data Consistency

We’ve mastered idempotency—ensuring that retries don’t cause duplicates. But what happens when a logical operation spans multiple services, and one of them fails halfway through?

In the next chapter, we’ll explore Distributed Transactions, including sagas and compensating transactions. Idempotency is the foundation: if your compensating transaction fails and must retry, idempotency ensures it doesn’t undo something twice. We’ll see how the patterns combine—idempotency handles at-least-once delivery; sagas handle multi-service coordination.

Then, when we reach Database Partitioning and Sharding, idempotency becomes even more critical. With data split across multiple nodes, ensuring exactly-once processing of updates becomes exponentially more complex. You’ll want these reliability patterns wired into your DNA before you shard.