System Design Fundamentals

Two-Phase Commit (2PC)

A

Two-Phase Commit (2PC)

The Travel Booking Problem

Imagine you’re building a travel booking platform. A customer wants to book a complete vacation package: a flight, a hotel, and a rental car. From the customer’s perspective, this is a single atomic action — book all three or book none. But here’s the reality: the flight is booked through an airline’s system, the hotel through a hospitality provider’s system, and the car through a rental company’s system. These are three separate databases, possibly in different datacenters, operated by different organizations.

Your system successfully books the flight and charges the customer’s credit card. The hotel booking goes through. But then the rental car system returns an error — they have no vehicles available for those dates. Now you face a crisis: the customer has been charged for a flight and hotel but has no rental car. The booking is in an inconsistent state, and the customer is understandably upset.

This is the distributed transaction problem, and it’s been a fundamental challenge in computer science since the early days of networked systems. Two-Phase Commit (2PC) is one of the oldest and most widely implemented solutions. While it’s not perfect — and we’ll see why — understanding 2PC is essential for building reliable distributed systems.

What Is a Distributed Transaction?

In a single database, transactions are straightforward. The ACID properties (Atomicity, Consistency, Isolation, Durability) are provided by the database engine. You can modify multiple rows across multiple tables, and if anything goes wrong, you roll back everything. The database ensures that from the outside world’s perspective, the transaction either fully succeeded or fully failed — never partially completed.

A distributed transaction extends this concept across system boundaries. It’s a transaction that involves multiple independent resources — different databases, message queues, web services, or any component that can succeed or fail independently. The goal is the same: achieve atomicity across all these resources so that either the entire transaction succeeds everywhere, or it fails everywhere, and partial success is impossible.

The challenge is that you can no longer rely on a single entity (like a database engine) to orchestrate everything. Instead, you need a protocol that coordinates multiple parties to reach consensus on whether to commit or abort. This is where 2PC enters the picture.

Coordinator and Participants: The Players in 2PC

Every 2PC transaction involves two roles:

  • Coordinator: Also called the “transaction manager”, this is typically your application or a dedicated transaction service. It initiates the protocol and makes the final commit/abort decision.
  • Participants: Also called “resource managers”, these are the individual systems that hold data — databases, message brokers, caching systems. Each participant can independently succeed or fail.

Think of the coordinator as the director of a distributed orchestra. It doesn’t produce the music; it tells the musicians when to play, ensuring everyone stays in sync.

The Two Phases: Prepare and Commit

2PC is named for its two distinct phases of operation:

Phase 1: Prepare (Vote)

The coordinator sends a “prepare to commit” message to all participants, asking: “Can you commit this transaction?”

Each participant then:

  1. Executes all the transaction’s operations up to the point of commit
  2. Acquires locks on affected resources to prevent other transactions from interfering
  3. Writes all changes to a write-ahead log (WAL) — a durable, sequential log on disk
  4. Responds to the coordinator with either a YES vote (I can commit) or a NO vote (I cannot commit)

Critically, at this point, the participant has done the work but not yet made it permanent. The changes exist in temporary storage, ready to be committed or rolled back.

Phase 2: Commit or Abort

Based on the votes from all participants:

  • If all participants voted YES: The coordinator sends a COMMIT message to all participants. Each participant applies the changes permanently, releases locks, and acknowledges the commit.
  • If any participant voted NO (or if the coordinator times out waiting for a response): The coordinator sends an ABORT message. Each participant rolls back the changes, releases locks, and acknowledges the abort.

Only after the coordinator has logged its decision and received acknowledgments from all participants is the transaction complete.

The Wedding Ceremony Analogy

The best way to understand 2PC is through the lens of a wedding ceremony:

The officiant (coordinator) asks each person involved — the bride, the groom, potentially family members (participants) — “Do you take this person?” (Phase 1 prepare). Each person considers carefully, checks their own situation (locking their resources), and answers either “I do” (YES vote) or “I don’t” (NO vote).

Only if everyone says “I do” does the officiant pronounce them married (Phase 2 commit). If even one person says “I don’t”, the wedding is off (abort).

Here’s the crucial part: once you say “I do”, you’ve committed to the decision. You can’t change your mind while waiting for the officiant to pronounce the result. You’re locked in place, just as database participants are locked while waiting for the coordinator’s final decision.

How 2PC Actually Works: The Technical Details

Message Flow and State Transitions

Let’s trace through a concrete example with three participants:

sequenceDiagram
    participant Coord as Coordinator
    participant P1 as Participant 1<br/>(Flight System)
    participant P2 as Participant 2<br/>(Hotel System)
    participant P3 as Participant 3<br/>(Car System)

    Coord->>P1: PREPARE (booking details)
    Coord->>P2: PREPARE (booking details)
    Coord->>P3: PREPARE (booking details)

    P1->>P1: Execute & lock resources
    P2->>P2: Execute & lock resources
    P3->>P3: Execute & lock resources

    P1-->>Coord: YES (prepared)
    P2-->>Coord: YES (prepared)
    P3-->>Coord: NO (cannot complete)

    Coord->>Coord: Decision: ABORT

    Coord->>P1: ABORT
    Coord->>P2: ABORT
    Coord->>P3: ABORT

    P1-->>Coord: Aborted
    P2-->>Coord: Aborted
    P3-->>Coord: Aborted

The Coordinator’s Decision Log

The coordinator must maintain a durable log of its decision for every transaction. Why? Because the coordinator itself can crash. If the coordinator crashes between collecting votes and sending decisions, it needs to recover and complete the protocol correctly.

When the coordinator crashes and recovers:

  1. It reads its decision log
  2. For any transactions that were decided before the crash, it resends the decision to all participants
  3. For transactions where the coordinator never made a decision, it assumes ABORT

This recovery procedure is critical — it’s what prevents the distributed transaction from hanging forever.

Participant State Transitions

Each participant transitions through well-defined states:

Initial

  (receives PREPARE message)
Preparing

  (executes work, acquires locks)
Prepared (and waiting)

  (receives COMMIT or ABORT)
Committed or Aborted

If a participant crashes while in the “Prepared” state, it has a problem: its local log shows it prepared successfully, but it doesn’t know whether the coordinator ultimately decided to commit or abort. When it recovers, it must ask the coordinator: “What was your decision for transaction X?” The coordinator, consulting its decision log, responds appropriately.

The Blocking Problem: 2PC’s Achilles Heel

2PC has a well-known weakness called the blocking problem. Once a participant votes YES and enters the “Prepared” state, it must hold locks on its resources. These locks prevent other transactions from accessing the data. The participant remains blocked until it receives the coordinator’s final decision.

If the coordinator crashes or becomes unreachable, participants can be blocked indefinitely, holding locks and making those resources unavailable to other transactions. This directly reduces system availability and responsiveness.

Consider a scenario: A participant votes YES and locks inventory. The coordinator then suffers a network partition and cannot communicate the decision. The participant’s inventory remains locked. Other customers trying to purchase the same items are blocked. From their perspective, the entire system feels hung.

This is why 2PC is called a blocking protocol and why many modern systems try to avoid it at scale.

XA: The Industry Standard

The XA specification (eXtended Architecture) is the industry standard for distributed transactions. It’s defined by the X/Open group and implemented by most relational databases and many middleware systems.

In Java, the Java Transaction API (JTA) uses XA transactions:

import javax.transaction.UserTransaction;
import javax.naming.InitialContext;

// Injected by the container
@Resource
private UserTransaction utx;

public void bookTravelPackage(String flightId, String hotelId, String carId) {
    try {
        utx.begin();

        // Phase 1: Prepare - each operation prepares itself
        flightService.reserve(flightId);
        hotelService.reserve(hotelId);
        carService.reserve(carId);

        // Phase 2: Commit - all prepared, now commit all
        utx.commit();

    } catch (Exception e) {
        try {
            utx.rollback();  // Abort all
        } catch (Exception re) {
            logger.error("Rollback failed", re);
        }
    }
}

In PostgreSQL, you can use explicit transaction control:

-- Participant 1: Flight System
PREPARE TRANSACTION 'flight-booking-123';

-- Participant 2: Hotel System
PREPARE TRANSACTION 'hotel-booking-123';

-- Participant 3: Car System
PREPARE TRANSACTION 'car-booking-123';

-- Coordinator decision: All prepared, so commit
COMMIT PREPARED 'flight-booking-123';
COMMIT PREPARED 'hotel-booking-123';
COMMIT PREPARED 'car-booking-123';

MySQL supports XA via explicit XA commands:

XA START 'xid-123';
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
XA END 'xid-123';
XA PREPARE 'xid-123';
-- Check if we can commit, then:
XA COMMIT 'xid-123';

Performance Characteristics

2PC is expensive in terms of performance. Here’s why:

  • Multiple network round trips: Minimum 2 round trips (prepare and commit), but often more with acknowledgments and recovery scenarios
  • Synchronous coordination: The coordinator waits for responses before proceeding — blocking operations are slow
  • Lock holding duration: Locks are held from the prepare phase through the commit phase, which could be seconds or longer in high-latency networks
  • Logging overhead: Every state transition must be persisted to disk

In a scenario with 3 participants across 3 datacenters with 50ms latency each:

  • Prepare phase: 150ms (3 × 50ms, sequentially)
  • Commit phase: 150ms
  • Acks and recovery: 50-100ms
  • Total: 300-400ms minimum

For high-frequency systems, this is unacceptable.

Failure Scenarios in Detail

Scenario 1: Participant Crashes Before Voting

The participant receives PREPARE but crashes before responding. The coordinator times out waiting and assumes a NO vote, sending ABORT to all participants. When the crashed participant recovers, its log shows “prepared” but no decision was received. It needs to check with the coordinator, which tells it: “I already sent you ABORT.”

Scenario 2: Participant Crashes After Voting YES

The participant responds YES but crashes before receiving the COMMIT/ABORT decision. Upon recovery, it reads its log: “I voted YES at time T for transaction X”. It contacts the coordinator: “What’s the decision?” The coordinator consults its log and responds. The participant then applies the decision.

Scenario 3: Coordinator Crashes After Collecting Votes

This is the blocking scenario. Participants are stuck in PREPARED state, holding locks. When the coordinator recovers, it checks its decision log. If the decision was logged, it resends COMMIT to all participants. If no decision was logged, it assumes ABORT and sends that instead. Participants that were waiting now receive the decision and can proceed.

Scenario 4: Network Partition Between Coordinator and Participants

Some participants can’t reach the coordinator. They time out. The coordinator also times out waiting for their responses. The coordinator aborts (to be safe). Those participants, upon recovery, learn they were aborted. However, if some participants did respond and commit, while others abort, you have partial completion — the worst outcome. This is why network reliability is so critical for 2PC.

When 2PC Is Appropriate

Despite its weaknesses, 2PC is still the right choice in certain scenarios:

  1. Within a single datacenter: Low latency and high reliability make the blocking problem less severe
  2. Fewer participants: The more participants involved, the higher the risk of one blocking the whole transaction
  3. Database-level transactions, not service-level: Using 2PC between two database transactions is different from using it between microservices. Databases have optimized implementations
  4. Strong consistency is non-negotiable: When partial failures are unacceptable (e.g., financial transactions, order management in critical systems)
  5. Short transaction durations: The longer a transaction takes, the longer locks are held and the higher the chance of participant failure

2PC is NOT appropriate for:

  • Transactions spanning services across multiple datacenters
  • Scenarios where network unreliability is expected
  • High-frequency, low-latency systems
  • Systems with many participants in the transaction

Comparison: 2PC vs. Eventual Consistency

2PC provides strong consistency but at high cost. Alternatively, eventual consistency approaches (like the Saga pattern) relax consistency guarantees in exchange for better availability:

Aspect2PCEventual Consistency (Saga)
ConsistencyStrong (immediate)Eventual (delayed)
AvailabilityLower (blocking)Higher (non-blocking)
Latency300-500ms+50-100ms
ComplexityModerate (protocol complexity)High (requires compensating transactions)
Failure handlingAutomatic rollbackManual compensation logic
Use caseFinancial systems, inventoryUser accounts, analytics

We’ll explore Sagas in depth in the next chapter. The key insight: trade-offs are fundamental. You cannot have strong consistency, high availability, and low latency simultaneously in distributed systems (this is the CAP theorem).

Key Takeaways

  • Two-Phase Commit is a protocol for achieving atomicity across multiple independent systems through a coordinator collecting votes (prepare) and then making a final decision (commit/abort)
  • The blocking problem is 2PC’s core weakness: participants wait for the coordinator’s decision while holding locks, which can freeze those resources if the coordinator fails
  • Write-ahead logs (WAL) are essential for recovery — both the coordinator and participants must durably log their state transitions
  • 2PC is appropriate for strongly consistent transactions within a single datacenter or between tightly coupled databases, but becomes problematic at scale and across unreliable networks
  • The XA specification provides the industry standard implementation; understand it if you’re working with JTA, MySQL XA, or PostgreSQL prepared transactions
  • Network reliability is paramount for 2PC — partitions cause indefinite blocking and potential partial commits

Practice Scenarios

Scenario 1: The Cascading Timeout You’re building a payment system using 2PC across three services: Bank Account (10ms latency), Fraud Detection (500ms latency in peak hours), and Ledger (50ms latency). What latency do your users experience for a payment? What happens if Fraud Detection becomes unresponsive? How would you mitigate this?

Scenario 2: The Recovery Question Design the recovery procedure for this situation: The coordinator crashed after collecting YES votes from all participants but before logging its commit decision. When it recovers, what decision should it make? Why? What must participants already have logged to handle this safely?

Scenario 3: Database Replication Complexity You’re implementing 2PC across a primary database and its replica (using database-level replication for high availability). If the primary votes YES and begins prepare but the replica fails before replication, what happens? How does this differ from single-database transaction handling?

Looking Ahead: Three-Phase Commit

2PC’s blocking problem motivated researchers to design the Three-Phase Commit (3PC) protocol in the 1980s. 3PC adds an additional “pre-commit” phase to eliminate the scenario where the coordinator crashes after collecting votes, leaving participants indefinitely blocked. By adding this extra round-trip and more complex state management, 3PC improves fault tolerance — but at the cost of increased complexity and latency.

In the next chapter, we’ll explore 3PC, understand its improvements and its tradeoffs, and then move into modern alternatives like Sagas and event sourcing that sidestep the entire problem by embracing eventual consistency.