System Design Fundamentals

ACID Properties & Transactions

A

ACID Properties & Transactions

The Banking Disaster Scenario

It’s 3 AM on a Tuesday. Sarah initiates a wire transfer of $5,000 from her savings account to her checking account. The database receives the transaction request, deducts $5,000 from savings, and then… the power fails. When the system comes back online, Sarah’s savings account shows the $5,000 deduction, but her checking account never received the credit. Her money has vanished into digital thin air. Welcome to the nightmare that ACID properties were designed to prevent.

This scenario is far too common in systems that lack proper transaction management. As you design databases that must handle critical operations—payments, inventory management, financial records, medical data—understanding ACID guarantees becomes non-negotiable. In this chapter, we’ll explore how databases ensure that your data doesn’t just survive failures, but survives them with integrity intact.

What Are ACID Properties?

ACID is an acronym representing four fundamental guarantees that databases provide for transactions. Let’s define each precisely:

Atomicity means a transaction is all-or-nothing. Either all operations within the transaction complete successfully, or none of them do. There is no partial execution. When Sarah’s transfer completes, both the debit and credit happen, or nothing happens at all.

Consistency means the database transitions from one valid state to another valid state. All rules, constraints, and invariants that define a “valid” state are maintained before and after the transaction. If your database rule states “no account balance can be negative,” consistency guarantees this rule holds after every transaction.

Isolation means concurrent transactions don’t see partial results from each other. Transaction A doesn’t see uncommitted changes from Transaction B, even if they’re executing simultaneously. This prevents you from reading “dirty” data or making decisions based on incomplete information.

Durability means once a transaction commits, it stays committed—even if your servers catch fire. The data survives power failures, crashes, network partitions, and hardware failures. Once you tell the user “your transfer is complete,” you’re making a promise backed by durable storage.

The Vending Machine Analogy

Imagine a vending machine with perfect ACID properties. You insert $2 and select a candy bar. What happens?

  • Atomicity: Either the machine takes your $2 AND delivers the candy, or it returns your money AND keeps the candy. Never does it take your money without delivering the product.

  • Consistency: The machine always maintains valid state—your money is either in your pocket or in the machine’s cash box, but not both. The candy is either in your hand or in the machine’s inventory, never split.

  • Isolation: If you and your friend simultaneously buy items, the machine doesn’t get confused about inventory or cash. Each transaction completes independently as if the other wasn’t happening.

  • Durability: Once the machine beeps “transaction complete,” your purchase is permanent. Even if it loses power, the money counted and inventory updated persist when it restarts.

Now consider a vending machine without ACID properties: it takes your money, the electricity flickers, and you get nothing. That machine doesn’t last long before customers demand refunds.

How Databases Implement Atomicity

Write-Ahead Logging (WAL)

The most common technique for implementing atomicity is Write-Ahead Logging. Here’s how it works:

Before modifying any data, the database writes a detailed log entry describing what will change. This log is written to durable storage (disk) before the actual data modification happens. If the system crashes mid-transaction:

  1. The data modification might be incomplete or missing
  2. But the log entry exists on disk
  3. On recovery, the database replays the log to either complete the transaction (if a COMMIT was logged) or undo it (if no COMMIT exists)

Let’s trace Sarah’s transfer with WAL:

[Write to Log]: BEGIN TRANSACTION tx_001
[Write to Log]: UPDATE accounts SET balance = balance - 5000 WHERE account_id = 'savings'
[Perform]: Deduct from savings
[Write to Log]: UPDATE accounts SET balance = balance + 5000 WHERE account_id = 'checking'
[Perform]: Add to checking
[Write to Log]: COMMIT tx_001
[Persist]: Durably store the commit record

If power fails before the COMMIT is logged, recovery will undo both the debit and credit. If power fails after COMMIT, recovery will complete both if they weren’t fully applied yet.

Undo Logs and Redo Logs

Many databases maintain both:

  • Redo logs contain the new values. Used to complete committed transactions after a crash.
  • Undo logs contain the old values. Used to roll back incomplete transactions.

During recovery, the database:

  1. Identifies all committed transactions in the log
  2. Replays their changes (redo) to ensure all committed work appears in the data
  3. Rolls back any uncommitted work using undo information

Consistency Enforcement

Consistency means the database enforces its business rules. These come in several forms:

Constraints are the simplest: PRIMARY KEY, UNIQUE, NOT NULL, CHECK conditions. The database refuses any transaction that violates these.

Foreign Keys maintain referential integrity. You can’t delete a customer if they have active orders.

Triggers are more sophisticated—executable code that runs automatically when data changes. A banking system might use triggers to log all balance modifications for audit purposes.

Application-level validation adds business logic. The database enforces that account balances don’t go negative, that inventory counts never become inconsistent with shipments, and that dates make logical sense.

Consider this schema that maintains consistency:

CREATE TABLE accounts (
  account_id INT PRIMARY KEY,
  balance DECIMAL(15,2) NOT NULL CHECK (balance >= 0),
  account_type VARCHAR(20) NOT NULL
);

CREATE TABLE transactions (
  transaction_id INT PRIMARY KEY,
  from_account INT NOT NULL REFERENCES accounts(account_id),
  to_account INT NOT NULL REFERENCES accounts(account_id),
  amount DECIMAL(15,2) NOT NULL CHECK (amount > 0),
  status VARCHAR(20) DEFAULT 'pending'
);

These constraints make it impossible for the database to enter an invalid state—negative balances become impossible, orphaned transactions (referencing deleted accounts) become impossible.

Isolation Levels in Detail

Isolation determines what happens when multiple transactions execute concurrently. Different applications need different levels of isolation, and stricter isolation costs performance.

Isolation Phenomena

Before examining levels, understand the three key problems that can occur without proper isolation:

Dirty reads occur when Transaction A reads uncommitted changes from Transaction B. If Transaction B rolls back, Transaction A read data that never actually committed.

Non-repeatable reads happen when Transaction A reads a value, then Transaction B modifies it, then Transaction A reads the same value again and gets different results—within the same transaction.

Phantom reads occur when Transaction A reads a set of rows matching a condition, then Transaction B inserts or deletes rows matching that condition, and a re-read of the same condition returns different rows.

The Four Standard Isolation Levels

-- Example showing different isolation phenomena

-- Read Uncommitted (least restrictive)
BEGIN TRANSACTION; -- Transaction A
SELECT balance FROM accounts WHERE account_id = 1; -- Reads $1000
-- Meanwhile, Transaction B updates to $2000 but hasn't committed
-- Transaction A can see the $2000 (dirty read)

-- Read Committed
BEGIN TRANSACTION; -- Transaction A
SELECT balance FROM accounts WHERE account_id = 1; -- Reads $1000
-- Transaction B commits a change to $2000
SELECT balance FROM accounts WHERE account_id = 1; -- Now reads $2000 (non-repeatable read)

-- Repeatable Read
BEGIN TRANSACTION; -- Transaction A
SELECT balance FROM accounts WHERE account_id = 1; -- Reads $1000
-- Transaction B tries to update: succeeds if no lock conflict
SELECT balance FROM accounts WHERE account_id = 1; -- Still reads $1000 (repeatable)
-- But if Transaction B inserts new accounts, phantom read can still occur

-- Serializable (most restrictive)
BEGIN TRANSACTION; -- Transaction A
SELECT * FROM accounts WHERE balance < 100; -- Locks the range
-- Transaction B cannot insert/update/delete in this range
COMMIT; -- Lock released

Here’s how each level handles the phenomena:

Isolation LevelDirty ReadsNon-repeatable ReadsPhantom Reads
Read UncommittedPossiblePossiblePossible
Read CommittedPreventedPossiblePossible
Repeatable ReadPreventedPreventedPossible
SerializablePreventedPreventedPrevented

Read Uncommitted is rarely used in production. It offers no real isolation—Transaction A can see data that never actually committed. The only advantage is maximum concurrency.

Read Committed (the PostgreSQL default) prevents dirty reads. You see only committed data. However, another transaction can modify data between your reads, causing non-repeatable reads.

Repeatable Read prevents dirty and non-repeatable reads by holding read locks until transaction end. However, if new rows are inserted matching your query condition, you’ll see them (phantom reads).

Serializable provides complete isolation as if transactions executed one at a time sequentially. No phenomena are possible, but concurrency is lowest. Modern databases implement this efficiently through multiversion concurrency control (MVCC) or predicate locking rather than simple locks.

Durability and Persistence

Write-Ahead Logging (Revisited)

Durability isn’t just about having data on disk—it’s about guaranteeing it persists across failures. The WAL pattern achieves this:

Transaction commits

Log entry written to disk (fsync)

Database confirms "commit successful" to client

Data changes apply to in-memory buffers

Periodic checkpointing writes buffers to stable storage

The critical point: the log is written and fsynced before confirming success to the client. Even if the data hasn’t reached disk yet, the log guarantees recovery.

Checkpointing

Databases periodically create checkpoints—snapshots of all modified data written to stable storage. This serves two purposes:

  1. Reduces recovery time (no need to replay months of logs)
  2. Allows older log entries to be discarded

Between checkpoints, only logs are needed for recovery. After a crash:

Load from last checkpoint

Replay logs from that point forward

Undo any uncommitted transactions

Database is consistent

Replication for Durability

Some systems enhance durability through replication:

Primary Database (writes)
    ↓ (logs replicated immediately)
Replica Database (read-only copy)
    ↓ (another layer of durability)
Disaster Recovery Site

If the primary fails, the replica takes over. The data exists in multiple places, protected against single-point failures.

Distributed Transactions

Things get complicated when a transaction spans multiple databases or systems. You can’t simply write a log to disk on one server and call it durable—you need consensus across systems.

Two-Phase Commit (2PC)

The classical approach: a coordinator ensures all participants commit or all rollback.

Coordinator                 Participant A              Participant B
    │                              │                          │
    ├─── Prepare Request ────────→ │                          │
    │                        (lock resources)                  │
    ├──────────────────────────────────── Prepare Request ────→ │
    │                                                   (lock resources)
    │                        (vote ready)                      │
    │ ← ─── Yes/No Vote ─────────────── ← ───Yes/No Vote ─── │
    │                                                          │
    ├─── Commit/Abort ──────────────→ │                        │
    │                          (persist changes)               │
    ├──────────────────────────────────── Commit/Abort ───────→ │
    │                                                  (persist changes)

Phase 1 (Prepare): Coordinator asks all participants if they can commit. Participants lock resources, validate constraints, but don’t commit yet. Each votes “yes” (ready to commit) or “no” (can’t commit).

Phase 2 (Commit): If all voted yes, coordinator tells everyone to commit. If any voted no, coordinator tells everyone to abort.

The key guarantee: all participants end up in the same state.

Problem: If the coordinator crashes between learning the votes and sending commit/abort, participants hang indefinitely with locks held.

Three-Phase Commit (3PC)

Adds a pre-commit phase to handle coordinator failure:

Phase 1: Prepare (vote)
Phase 2: Pre-commit (everyone says they're ready to commit)
Phase 3: Commit (actually apply changes)

If the coordinator fails between any phase, participants can detect it and make progress. Still rarely used due to complexity.

The Saga Pattern

Instead of trying to coordinate across systems, execute a sequence of local transactions with compensating transactions for rollback:

-- Saga for order processing

-- Step 1: Create order (local transaction on order DB)
BEGIN TRANSACTION;
  INSERT INTO orders VALUES (order_id, 'pending');
COMMIT;

-- Step 2: Reserve inventory (local transaction on inventory DB)
BEGIN TRANSACTION;
  UPDATE inventory SET available = available - qty WHERE item_id = X;
COMMIT;

-- Step 3: Process payment (local transaction on payment DB)
BEGIN TRANSACTION;
  INSERT INTO payments VALUES (payment_id, 'processed');
COMMIT;

-- Each step is independently durable. If step 3 fails:
-- Step 4: Compensate for step 2
BEGIN TRANSACTION;
  UPDATE inventory SET available = available + qty WHERE item_id = X;
COMMIT;

-- Step 5: Compensate for step 1
BEGIN TRANSACTION;
  UPDATE orders SET status = 'cancelled' WHERE order_id = order_id;
COMMIT;

Sagas sacrifice immediate atomicity for practical distributed systems. The order isn’t instantly atomic across systems, but the system can always recover to a consistent state through compensating actions.

ACID vs BASE Trade-offs

Many distributed systems relax ACID guarantees for availability and performance. The BASE model (Basically Available, Soft state, Eventually consistent) acknowledges practical realities:

Basically Available: The system responds to requests even during partial failures.

Soft state: Data is temporarily inconsistent. Multiple copies of data may differ momentarily.

Eventually Consistent: Given no new updates, all copies eventually converge to identical state.

This works well for systems like social media feeds (eventual consistency of likes is acceptable) but fails for banking (you can’t relax atomicity of transfers).

Performance Trade-offs

Isolation LevelLock ContentionThroughputUse Case
Read UncommittedMinimalHighestRarely acceptable
Read CommittedLowHighWeb applications
Repeatable ReadModerateModerateReporting, analytics
SerializableHighLowestHigh-value, low-frequency transactions

Stricter isolation means more locking, more waiting, lower throughput. Design around your actual isolation needs—don’t default to Serializable if Read Committed works.

Practical Examples

A Bank Transfer Transaction

BEGIN TRANSACTION;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- Debit the source account
UPDATE accounts
SET balance = balance - 500
WHERE account_id = 'account_001' AND balance >= 500;

-- Verify the update succeeded
IF @@ROWCOUNT = 0 THEN
    ROLLBACK;
    RAISE EXCEPTION 'Insufficient funds';
END IF;

-- Credit the destination account
UPDATE accounts
SET balance = balance + 500
WHERE account_id = 'account_002';

-- Log the transaction for audit
INSERT INTO transaction_log
VALUES (NULL, 'account_001', 'account_002', 500, NOW());

COMMIT;

This transaction guarantees atomicity (both updates or neither), consistency (balances remain valid), isolation (other transactions see either before or after, never in-between), and durability (once COMMIT returns, both updates persist).

Demonstrating Isolation Levels

-- Connection 1: Transaction A
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRANSACTION;
SELECT balance FROM accounts WHERE account_id = 1; -- Reads $1000

-- Connection 2: Transaction B (execute while A is still running)
BEGIN TRANSACTION;
UPDATE accounts SET balance = 2000 WHERE account_id = 1;
COMMIT; -- B commits its change

-- Connection 1: Back in Transaction A
SELECT balance FROM accounts WHERE account_id = 1; -- Now reads $2000!
-- This is a non-repeatable read at READ COMMITTED level
COMMIT;

Compare with Repeatable Read:

-- Connection 1: Transaction A
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION;
SELECT balance FROM accounts WHERE account_id = 1; -- Reads $1000
-- Holds a read lock on this row

-- Connection 2: Transaction B
BEGIN TRANSACTION;
UPDATE accounts SET balance = 2000 WHERE account_id = 1;
-- Blocks, waiting for A's read lock

-- Connection 1: Back in Transaction A
SELECT balance FROM accounts WHERE account_id = 1; -- Still reads $1000
COMMIT; -- Releases lock
-- Now B's update completes

Key Takeaways

  • Atomicity is non-negotiable for critical operations. Every cent must move from one account to another, or not at all. Write-ahead logging makes this possible.

  • Isolation level is a conscious trade-off. Choose Read Committed for most web applications and Serializable only where consistency requires it. Monitor lock contention to identify problems.

  • Durability requires more than disk writes. Verify that your databases are configured with proper WAL, that logs are fsynced, and that you’re testing recovery scenarios. A database that can’t recover from failure isn’t durable.

  • Distributed transactions are hard. Prefer saga patterns over 2PC. If you use 2PC, test coordinator failure scenarios extensively.

  • Eventually consistent systems need careful design. Don’t use them for operations where temporary inconsistency causes user-visible problems. Financial systems, inventory systems, and booking systems rarely tolerate eventual consistency.

  • Monitor your database’s ACID implementation. Long-running transactions block others. Lock timeouts prevent deadlocks but need handling. Slow recovery from crashes indicates problems with your logging or checkpointing strategy.

Put It Into Practice

Scenario 1: Inventory Management Crisis

Your e-commerce platform uses a microservices architecture: an order service (PostgreSQL), an inventory service (a separate database), and a payment service. A customer buys the last copy of an item. Your system deducts inventory, processes payment, and then the payment service goes down for 2 hours. By the time it recovers, another customer has already been allocated the same inventory. How do you fix this architecturally? What pattern prevents this? (Hint: Think about the order of operations and compensation.)

Scenario 2: Lock Contention Performance Problem

Your analytics team runs hourly reports that scan your entire customer table with Serializable isolation to ensure accurate totals. Meanwhile, your web application is getting slower because its Read Committed transactions are waiting for locks. Why is this happening? What are three ways to solve it without sacrificing the report’s accuracy? (Consider transaction scope, timing, and isolation levels.)

Scenario 3: Distributed Transfer Failure

Your payment system transfers money between customer accounts held in different regional databases for latency reasons. You use 2PC to ensure atomicity. The coordinator sends the prepare request to both regions, both respond “ready to commit,” but before the coordinator sends the commit request, the coordinator process crashes. What happens? How long does recovery take? What’s a better architecture?

Looking Ahead

Now that you understand how databases maintain consistency within a single server, we need to examine one of the most powerful tools for performance optimization: database indexes. As transaction throughput increases and queries become more complex, an unindexed table can mean the difference between a snappy application and one that grinds to a halt. The next chapter explores how databases structure data for speed, and the critical decisions you must make about which data to index and how.